Issue
I have a pandas dataframe df
which appears as following: (toy version below but the real df contains many more columns and groups
)
group sub fruit
a 1 apple
a 2 banana
a 3 orange
b 1 pear
b 2 strawberry
b 3 cherry
c 1 kiwi
c 2 tomato
c 3 lemon
All group
s have the same number of rows. I am trying to generate a new dataframe that contains all the combinations of group
and sub
but specifically so that each combo contains the all types of group
s and all types of sub
s.
Desired output:
combo group sub fruit
1 a 1 apple
1 b 2 strawberry
1 c 3 lemon
2 a 1 apple
2 c 2 tomato
2 b 3 cherry
3 b 1 pear
3 a 2 banana
3 c 3 lemon
4 c 1 kiwi
4 a 2 banana
4 b 3 cherry
5 c 1 kiwi
5 b 2 strawberry
5 a 3 orange
...
So the below would be a wrong combination, since it was two values of the same sub
:
6 c 2 tomato
6 b 2 strawberry
6 a 3 orange
A previously post of mine randomly selected sub
s but I realized was too unconstrained: Generate combinations by randomly selecting a row from multiple groups (using pandas)
Solution
A solution could be:
import pandas as pd
import numpy as np
from itertools import permutations
df = pd.DataFrame({"group":{"0":"a","1":"a","2":"a","3":"b","4":"b","5":"b","6":"c","7":"c","8":"c"},
"sub":{"0":1,"1":2,"2":3,"3":1,"4":2,"5":3,"6":1,"7":2,"8":3},
"fruit":{"0":"apple","1":"banana","2":"orange","3":"pear","4":"strawberry",
"5":"cherry","6":"kiwi","7":"tomato","8":"lemon"}})
df2 = pd.DataFrame({"combo": [j for i in ([i]*3 for i in range(1,7)) for j in i],
"group": [j for i in permutations(["a","b","c"]) for j in i],
"sub":[1,2,3]*6})
pd.merge(df2, df, how="left", on=["group","sub"])
Answered By - Ric Villalba
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.