Issue
Hello I have iterated through multiple columns and it worked. BUT The column names in all the CSV files are in order like so:
Output: id title content tags
However my code outputs the columns in this order:
Output : content id tags title
How do I get it back in the order that all the csv files have it as
here is my code below:
import glob
import os
import pandas as pd
pd.set_option("display.max_rows", 999)
pd.set_option('max_colwidth',100)
import numpy as np
from IPython.display import display
%matplotlib inline
file_path = 'data/'
all_files = glob.glob(os.path.join(file_path, "*.csv"))
merging_csv_files = (pd.read_csv(f) for f in all_files)
stack_exchange_data = pd.concat(merging_csv_files, ignore_index=True)
print ("Data loaded succesfully!")
print ("Stack Exchane Data has {} rows with {} columns each.".format(*stack_exchange_data.shape))
Solution
The general way for selecting a DataFrame with columns in a specific order is to simply create a list of the order you desire and then pass that list to the bracket operator of the DataFrame like this:
my_col_order = ['id', 'title', 'content', 'tags']
df[my_col_order]
Also you might want to check that all the DataFrames indeed have the same column order. I don't believe Pandas will sort the column names in concat
unless there is at least one DataFrame that has a different column ordering. You might want to print out all the column names from all the DataFrames you are concatenating.
Answered By - Ted Petrou
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.