Issue
I have an xlsx file which contains 3 columns. This is how my xlsx file looks like:
Items Object Information
Item1 Some Object Some Information
Item2 Some Object Some Information
Item3 Some Object Some Information
Item4 Some Object Some Information
When reading this using pandas.read_excel, i am getting wrong column orders based on what I pass in names
argument.
When I say
df_sheet = pd.read_excel("myfile.xlsx",
sheet_name="mysheet",
engine="openpyxl",
header=None,
names=list("ABC"))
I get Items column as A, Object column as B, and Information column as C.
When I say
df_sheet = pd.read_excel("myfile.xlsx",
sheet_name="mysheet",
engine="openpyxl",
header=None,
names=list("AB"))
I get Items column as B, and Object column as A.
I was under the impression that pandas will preserve the column order and by passing names
argument in read_excel, i am asking it to name the first column as A and second column as B but for some reason when I read only 2 columns out of 3 the column order is not right but works fine if I read all 3 columns.
What am I missing here?
Solution
The issue boils down to the index column. Pandas did preserve the order of columns, it's just that the first column was used for the index.
You can fix this by specifying the columns you want to use with usecols
:
df = pd.read_excel("myfile.xlsx", sheet_name="mysheet", engine="openpyxl", header=None, names=list("AB"), usecols="A:B")
Answered By - kevin41
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.