Issue
The problem is simple. Here we have a dataframe with a specified datatype for columns:
df = pd.DataFrame({'A':[1,2], 'B':[3,4]})
df.A = df.A.astype('int16')
#df
A B
0 1 3
1 2 4
#df.dtypes
A int16
B int64
dtype: object
Now I zip two columns A
and B
into a tuple:
df['C'] = list(zip(df.A, df.B))
A B C
0 1 3 (1, 3)
1 2 4 (2, 4)
However, now the data type of values in column C
are changed.
type(df.C[0][0])
#int
type(df.A[0])
#numpy.int16
How can I zip two columns and keep the datatype of each value inside the tuples, so that type(df.C[0][0])
would be int16
(same as type(df.A[0])
)?
Solution
I think some type casting is happening when you refer as df.A
, etc. See https://numpy.org/doc/stable/reference/generated/numpy.ndarray.tolist.html
Return a copy of the array data as a (nested) Python list. Data items are converted to the nearest compatible builtin Python type, via the item function.
But this worked
>>> import pandas as pd
>>> df = pd.DataFrame({'A':[1,2], 'B':[3,4]})
>>> df.A = df.A.astype('int16')
>>> df['C'] = list(zip(df.A.values, df.B.values))
>>> df
A B C
0 1 3 (1, 3)
1 2 4 (2, 4)
>>> type(df.C[0][0])
<class 'numpy.int16'>
>>> type(df.C[0][1])
<class 'numpy.int64'>
Answered By - crayxt
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.