Issue
I have a pandas data frame and would like to stack 4 columns to 2. So I have this
df = pd.DataFrame({'date':['2023-12-01', '2023-12-05', '2023-12-07'],
'other_col':['a', 'b', 'c'],
'right_count':[4,7,9], 'right_sum':[2,3,5],
'left_count':[1,8,5], 'left_sum':[0,8,4]})
date other_col right_count right_sum left_count left_sum
0 2023-12-01 a 4 2 1 0
1 2023-12-05 b 7 3 8 8
2 2023-12-07 c 9 5 5 4
and would like to get this
date other_col side count sum
0 2023-12-01 a right 4 2
1 2023-12-05 b right 7 3
2 2023-12-07 c right 9 5
3 2023-12-01 a left 1 0
4 2023-12-05 b left 8 8
5 2023-12-07 c left 5 4
Solution
You can use a custom reshaping with a temporary MultiIndex:
out = (df
.set_index(['date', 'other_col'])
.pipe(lambda x: x.set_axis(x.columns.str.split('_', expand=True), axis=1))
.rename_axis(columns=['side', None])
.stack('side').reset_index()
)
tmp = df.melt(['date', 'other_col'], var_name='side')
tmp[['side', 'col']] = tmp['side'].str.split('_', n=1, expand=True)
out = (tmp.pivot(index=['date', 'other_col', 'side'],
columns='col', values='value')
.reset_index().rename_axis(columns=None)
)
Output:
date other_col side count sum
0 2023-12-01 a left 1 0
1 2023-12-01 a right 4 2
2 2023-12-05 b left 8 8
3 2023-12-05 b right 7 3
4 2023-12-07 c left 5 4
5 2023-12-07 c right 9 5
Or, much easier, using the janitor
library and pivot_longer
:
# pip install pyjanitor
import janitor
out = df.pivot_longer(index=['date', 'other_col'],
names_to=('side', '.value'),
names_pattern=r'([^_]+)_([^_]+)')
Output:
date other_col side count sum
0 2023-12-01 a right 4 2
1 2023-12-05 b right 7 3
2 2023-12-07 c right 9 5
3 2023-12-01 a left 1 0
4 2023-12-05 b left 8 8
5 2023-12-07 c left 5 4
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.