Issue
i have a problem that i cannot reach a solution, this is my dataset:
data = [[2022-06-01, 'John', 'A'], [2022-06-02, 'John', 'D'],
[2022-06-03, 'John', 'D'], [2022-06-01, 'Sara', 'D'],
[2022-06-02, 'Sara', 'D'], [2022-06-01, 'Lucas', 'A'],
[2022-06-02, 'Lucas', 'A'], [2022-06-01, 'Mike', 'D'],
[2022-06-02, 'Mike', 'A'], [2022-06-03, 'Mike', 'D']]
df = pd.DataFrame(data, columns=['Date', 'Name', 'Label'])
I just want to choose rows with D label that don't have an 'A' before the 'D' in each name, my real problem probably has more than 70 rows with each name and i need all the 'D' that match that condition, for example, the solution of this problem is:
Date Name Label
2022-06-01 Sara D
2022-06-02 Sara D
2022-06-01 Mike D
I need help because i didn't find any good aproach to see how i deal the problem, thank you for your help!
Solution
Use:
mask = df.assign(indicator=df["Label"].eq("A")).groupby("Name")["indicator"].transform("cumsum").lt(1)
res = df[mask & df["Label"].eq("D")]
print(res)
Output
Date Name Label
3 2022-06-01 Sara D
4 2022-06-02 Sara D
7 2022-06-01 Mike D
Step-by-Step
# is True when the values are A
indicator = df["Label"].eq("A")
# create new DataFrame with this new indicator column
temp = df.assign(indicator=indicator)
# group by name in the new DataFrame and find the cumulative sum of this indicator column
# notice that all D that come before an A will have 0
cumsum = temp.groupby("Name")["indicator"].transform("cumsum")
# create the final mask
mask = cumsum.lt(1) & df["Label"].eq("D")
res = df[mask]
print(res)
Answered By - Dani Mesejo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.