Issue
I have a pandas dataframe with the following data structure:
I'd like to create a new custom counter column grouped by user and code column. The intent is to increment if action is 'OUT', decrement if 'IN'. It should not have a negative number as assuming that occurrences of 'OUT' is always >= occurrences of 'IN'.
This is what I'd like to achieve, I've tried groupby
and transform
but didn't manage to.
Any suggestions will be greatly appreciated.
Solution
You can create a new column with the Action column converted to integer values and then use cumsum
on the new column:
df["Action_as_integer"] = df.Action.replace({"OUT": 1, "IN": -1})
df["Instances"] = df.groupby(["User", "Code"]).Action_as_integer.cumsum()
As mozway pointed out, cumsum
does not care about negative values. If you want to ensure there are no negative values in the Instances
column, you may do something like this:
def annotate_df_with_instances(df: pd.DataFrame) -> pd.DataFrame:
df["Action_as_integer"] = df.Action.replace({"OUT": 1, "IN": -1})
df["Instances"] = df.groupby(["User", "Code"]).Action_as_integer.cumsum()
if True in (df["Instances"].values < 0):
raise ValueError("Encountered negative number of instances.")
return df
Answered By - max_jump
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.