Issue
I want to get the list of values from col2
that belong to the same groupId
, given corresponding value in col1
. Col1
values can belong to multiple groups and in that case only top-most group should be considered (group 2 but not group 3 in my example). Col1
values are always identical within the same groupId
.
groupId | col1 | col2 |
---|---|---|
2 | a | 10 |
1 | b | 20 |
2 | a | 30 |
1 | b | 40 |
3 | a | 50 |
3 | a | 60 |
1 | b | 70 |
My current solution takes over 30s for a df
with 2000 rows and 32 values to search for in col1
('a' in this case):
group_id_groups = df.groupby('groupId')
for group_id, group in group_id_groups:
col2_values = list(group[group['col1'] == 'a']['col2'])
if col2_values:
print(col2_values)
break
result: [10, 30]
Solution
The sort
parameter of groupby defaults to true, which means the first group will be the topmost by default. You can change the col_to_search
to b
and get the other answer.
import pandas as pd
df = pd.DataFrame({'groupId': [2, 1, 2, 1, 3, 3, 1],
'col1': ['a', 'b', 'a', 'b', 'a', 'a', 'b'],
'col2': [10, 20, 30, 40, 50, 60, 70]})
col_to_search = 'a'
(
df.loc[df['col1'].eq(col_to_search)]
.groupby('groupId')['col2']
.apply(list)
.iloc[0]
)
Output
[10, 30]
Answered By - Chris
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.