Issue
I have two env, one Pandas = 2.1.4, another Pandas = 1.4.2, when I run the following code they give different result.
import pandas as pd
d = {'col0': ['a', 'b'], 'col1': [1.0, 2.0], 'col2': [3.0, 4.0]}
df = pd.DataFrame(data=d)
print(df)
df.iloc[:,1:3] = df.iloc[:,1:3].astype('int32')
print(df)
1.4.2 gives:
col0 col1 col2
0 a 1.0 3.0
1 b 2.0 4.0
col0 col1 col2
0 a 1 3
1 b 2 4
2.1.4 gives:
col0 col1 col2
0 a 1.0 3.0
1 b 2.0 4.0
col0 col1 col2
0 a 1.0 3.0
1 b 2.0 4.0
How do I modify the code so it will give the same result as in the 1.4.2 output for both 1.4.2 and 2.1.4? Please keep it one liner.
Also what changes between two Pandas version caused the different behaviour?
Thank you so much!
Solution
The issue is that assigning to part of a column doesn't downcast the dtype. If you have a float, assigning an int will keep the float.
You should probably use astype
:
df = df.astype(dict.fromkeys(df.columns[1:3], 'int32'))
Or, if there can be a risk of error:
df = df.astype(dict.fromkeys(df.columns[1:3], 'int32'), errors='ignore')
Note that there is no problem if you assign the full column without slicing:
cols = df.columns[1:3]
df[cols] = df[cols].astype('int32')
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.