Tuesday, February 1, 2022

[FIXED] How can I split a column of tuples in a Pandas dataframe?

February 01, 2022 dataframe, numpy, pandas, python, tuples No comments

Issue

I have a Pandas dataframe (this is only a little piece)

>>> d1
   y norm test  y norm train  len(y_train)  len(y_test)  \
0    64.904368    116.151232          1645          549
1    70.852681    112.639876          1645          549

                                    SVR RBF  \
0   (35.652207342877873, 22.95533537448393)
1  (39.563683797747622, 27.382483096332511)

                                        LCV  \
0  (19.365430594452338, 13.880062435173587)
1  (19.099614489458364, 14.018867136617146)

                                   RIDGE CV  \
0  (4.2907610988480362, 12.416745648065584)
1    (4.18864306788194, 12.980833914392477)

                                         RF  \
0   (9.9484841581029428, 16.46902345373697)
1  (10.139848213735391, 16.282141345406522)

                                           GB  \
0  (0.012816232716538605, 15.950164822266007)
1  (0.012814519804493328, 15.305745202851712)

                                             ET DATA
0  (0.00034337162272515505, 16.284800366214057)  j2m
1  (0.00024811554516431878, 15.556506191784194)  j2m
>>>

I want to split all the columns that contain tuples. For example, I want to replace the column LCV with the columns LCV-a and LCV-b.

How can I do that?

Solution

You can do this by doing pd.DataFrame(col.tolist()) on that column:

In [2]: df = pd.DataFrame({'a':[1,2], 'b':[(1,2), (3,4)]})

In [3]: df
Out[3]:
   a       b
0  1  (1, 2)
1  2  (3, 4)

In [4]: df['b'].tolist()
Out[4]: [(1, 2), (3, 4)]

In [5]: pd.DataFrame(df['b'].tolist(), index=df.index)
Out[5]:
   0  1
0  1  2
1  3  4

In [6]: df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)

In [7]: df
Out[7]:
   a       b  b1  b2
0  1  (1, 2)   1   2
1  2  (3, 4)   3   4

Note: in an earlier version, this answer recommended to use df['b'].apply(pd.Series) instead of pd.DataFrame(df['b'].tolist(), index=df.index). That works as well (because it makes a Series of each tuple, which is then seen as a row of a dataframe), but it is slower / uses more memory than the tolist version, as noted by the other answers here (thanks to denfromufa).

Answered By - joris

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, February 1, 2022

[FIXED] How can I split a column of tuples in a Pandas dataframe?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels