Issue
I want to concatenate dataframes vertically. Each dataframe I have is created from a file in a directory and I want to concatenate all of them. I can do this for each individual file:
df1 = pd.read_csv('C:/Users/Desktop/folder/file1.csv', usecols = 'name')
df2 = pd.read_csv('C:/Users/Desktop/folder/file1.csv', usecols = 'reads')
result = pd.concat([df1, df2], axis=1)
But, I'd have to do this for each individual file at a time. I tried saving the values in an empty array like this:
for file in glob.glob('C:/Users/Desktop/folder/file*.csv'):
df1 = pd.read_csv(file, usecols='name')
df2 = pd.read_csv(file, usecols='reads')
collected_columns.append(df1['name'])
collected_columns.append(df2['reads'])
final_df = pd.concat(df1, df2, join='outer', axis=1, sort=True)
# dataframe to csv
final_df.to_csv('C:/Users/Desktop/folder/TEST.csv')
but this keeps resulting in a dataframe with each column from each file side by side. I hope this makes sense, if anyone can help I'd greatly appreciate it!
Solution
Let's assume that result of first concatenation is as follows:
first_concat = pd.concat([df1, df2], axis=1)
name reads
0 Joe 1
1 Jack 2
2 John 3
And you have another file based on which you have another concatenation (the same code as the first file):
second_concat = pd.concat([df3, df4], axis=1)
name reads
0 Ava 11
1 Adam 22
In order to concat these two vertically, you should do:
all_df = [first_concat, second_concat]
final_df = pd.concat(all_df, ignore_index=True)
name reads
0 Joe 1
1 Jack 2
2 John 3
3 Ava 11
4 Adam 22
Then you can use it in your for loop
easily:
all_df = []
for file in glob.glob('C:/Users/Desktop/folder/file*.csv'):
df1 = pd.read_csv(file, usecols='name')
df2 = pd.read_csv(file, usecols='reads')
df_nr_concat = pd.concat([df1, df2], axis=1)
all_df.append(df_nr_concat)
final_df = pd.concat(all_df, ignore_index=True)
Answered By - Hoori M.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.