Issue
Is there any reason why pandas changes the type of columns from int to float in update, and can I prevent it from doing it? Here is some example code of the problem
import pandas as pd
import numpy as np
df = pd.DataFrame({'int': [1, 2], 'float': [np.nan, np.nan]})
print('Integer column:')
print(df['int'])
for _, df_sub in df.groupby('int'):
df_sub['float'] = float(df_sub['int'])
df.update(df_sub)
print('NO integer column:')
print(df['int'])
Solution
here's the reason for this: since you are effectively masking certain values on a column and replace them (with your updates), some values could become `nan
in an integer array this is impossible, so numeric dtypes are apriori converted to float (for efficiency), as checking first is more expensive that doing this
a change of dtype back is possible...just not in the code right now, therefor this a bug (a bit non-trivial to fix though): github.com/pydata/pandas/issues/4094
Answered By - Jeff
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.