Wednesday, October 6, 2021

[FIXED] pandas split column based on groupby

October 06, 2021 numpy, pandas, python No comments

Issue

I want to split a column into multiple column based on a grouped value. For example

# input
df = pd.DataFrame([[1,2,1], [1,4,4], [1,5,7], [2,1,1], [2,3,5], [2,3,1]], columns=['cat', 'v1', 'v2'])

#output
df_out = pd.DataFrame([[1,2,0,1,0], 
                       [1,4,0,4,0], 
                       [1,5,0,7,0], 
                       [2,0,1,0,1], 
                       [2,0,3,0,5], 
                       [2,0,3,0,1]], columns=['cat', 'v1_1', 'v1_2', 'v2_1', 'v2_2'])

I came up with this code, want to get an efficient one since my input file is huge.

grouped = df.groupby(['cat'])

cols = ['v1', 'v2']
for column in cols:
    new_name = [column + '_' + str(k) for k in grouped.groups]
    df[new_name] = np.zeros((df[column].size, len(grouped.groups)), dtype=df[column].dtype)
        
        
for k,index in grouped.groups.items():
    df.loc[index, [c + '_' + str(k) for c in cols]] = df.loc[index, cols].to_numpy()

In:

    cat v1  v2
0   1   2   1
1   1   4   4
2   1   5   7
3   2   1   1
4   2   3   5
5   2   3   1

out:

    cat  v1_1   v1_2    v2_1    v2_2
0   1    2         0    1   0
1   1    4         0    4   0
2   1    5         0    7   0
3   2    0         1    0   1
4   2    0         3    0   5
5   2    0         3    0   1

Solution

Use DataFrame.set_index with DataFrame.unstack, flatten MultiIndex and last add cat column by DataFrame.join:

df1 = df.set_index('cat', append=True).unstack(fill_value=0)
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')

df1 = df[['cat']].join(df1)
print (df1)
   cat  v1_1  v1_2  v2_1  v2_2
0    1     2     0     1     0
1    1     4     0     4     0
2    1     5     0     7     0
3    2     0     1     0     1
4    2     0     3     0     5
5    2     0     3     0     1

Answered By - jezrael

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, October 6, 2021

[FIXED] pandas split column based on groupby

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels