Issue
I have a df:
df = pd.DataFrame({'id': [1, 1, 2, 2, 2, 3, 4, 4, 4], \
"name": ["call", "response", "call", "call", "response", "call", "call", "response", "response"]})
id name
0 1 call
1 1 response
2 2 call
3 2 call
4 2 response
5 3 call
6 4 call
7 4 response
8 4 response
And I'm trying to extract a call - response pair, where the first response after call is the right pattern. Call and responses pairs are in their own subsets with id
like so:
id name
0 1 call
1 1 response
3 2 call
4 2 response
6 4 call
7 4 response
Ideally I'd keep the indexes
in the dataframe so I can use df.loc
with indexes later.
What I have tried is to go through the df
in subsets and apply
something or use rolling window
. But have only succeeded to get errors.
unique_ids = df.id.unique()
for unique_id in unique_ids :
df.query('id== @unique_id').apply(something))
I have yet to discover something that could work specifically with subsets
of dataframe
Solution
Use DataFrameGroupBy.shift
with compare values by Series.eq
for check equality and filter in boolean indexing
:
m1 = df['name'].eq('call') & df.groupby('id')['name'].shift(-1).eq('response')
m2 = df['name'].eq('response') & df.groupby('id')['name'].shift().eq('call')
df2 = df[m1 | m2]
print (df2)
id name
0 1 call
1 1 response
3 2 call
4 2 response
6 4 call
7 4 response
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.