Issue
I have two dataframes that look the same and for both of them I want to add an additional column and then reorder the columns. Here is a sample of what I tried to accomplish this:
data=[[1,2],[3,4]]
cols=['col1','col2']
df1=pd.DataFrame(data,columns=cols)
df2=pd.DataFrame(data,columns=cols)
for df in [df1,df2]:
df.loc[:,'col3']=[5,6]
df=df.reindex(['col3','col2','col1'],axis=1)
print(df1)
col1 col2 col3
0 1 2 5
1 3 4 6
print(df2)
col1 col2 col3
0 1 2 5
1 3 4 6
The third column was added as expected but the columns are still in the original order. I expected them to be col3, col2, col1. When I tried this later on the reindex worked as expected:
df1=df1.reindex(['col3','col2','col1'],axis=1)
I'm sure there is an explanation to why the column gets added but the reindex is ignored in my first attempt, but I have not been able to find one. Does anyone know why this happens?
Solution
This is because df in your for loop is a local variable. When you do df.loc[:,'col3']=[5,6]
, you do a modification to the thing df
references, which therefore affects df1
. However, doing
df.reindex(['col3','col2','col1'],axis=1)
does not modify the original DataFrame but creates a new copy of it, which is then assigned to the local variable df
inside the for loop. However, df1
and df2
remain unchanged. To see this, you can try printing df
at the end of the for loop. It should print the desired value you want for df2
(with the reindexing)
Answered By - Shreyas Balaji
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.