Issue
can you help me please?
I have the following table in my dataframe:
DATE | CUSTOMER | STATUS |
---|---|---|
01/01/2022 | A | ACTIVATED |
02/01/2022 | A | ACTIVE |
03/01/2022 | A | INACTIVE |
04/01/2022 | A | EXCEPTION |
05/01/2022 | A | INACTIVATED |
I want to mark my dataframe row when my client inactivated (status inactivated). However, I only want to catch when the status at least once has been ACTIVATED. If my client's status is INACTIVATED, but it hasn't passed at least once as ACTIVED, it doesn't matter to me. And also I want to get just only when INACTIVATED is AFTER ACTIVATED.
Could you help me how I can do this in python? I'm having difficulties, here are 2 examples of result I would like to get.
1o option:
DATE | CUSTOMER | STATUS | ACTIVATED/INACTIVATED |
---|---|---|---|
01/01/2022 | A | ACTIVATED | 1 |
02/01/2022 | A | ACTIVE | 1 |
03/01/2022 | A | INACTIVE | 1 |
04/01/2022 | A | EXCEPTION | 1 |
05/01/2022 | A | INACTIVATED | 1 |
2o option:
DATE | CUSTOMER | STATUS | ACTIVATED/INACTIVATED |
---|---|---|---|
01/01/2022 | A | ACTIVATED | |
02/01/2022 | A | ACTIVE | |
03/01/2022 | A | INACTIVE | |
04/01/2022 | A | EXCEPTION | |
05/01/2022 | A | INACTIVATED | 1 |
Thank you very much!
Solution
You can compare STATUS
per groups and test for values s
- 1
if exist both, 0
if no:
s = {'ACTIVATED','INACTIVATED'}
df['ACTIVATED/INACTIVATED'] = (df.groupby('CUSTOMER')['STATUS']
.transform(lambda x: set(x) >= s)
.astype(int))
print (df)
DATE CUSTOMER STATUS ACTIVATED/INACTIVATED
0 01/01/2022 A ACTIVATED 1
1 02/01/2022 A ACTIVE 1
2 03/01/2022 A INACTIVE 1
3 04/01/2022 A EXCEPTION 1
4 05/01/2022 A INACTIVATED 1
EDIT: For test dates with ACTIVATED
if less like INACTIVATED
create helper columns with aggregate first/last
, compare datetimes in eval
and last mapping CUSTOMER
column:
df['DATE'] = pd.to_datetime(df['DATE'], dayfirst=True)
s = (df.assign(ACTIVATED=df['DATE'].where(df['STATUS'].eq('ACTIVATED')),
INACTIVATED=df['DATE'].where(df['STATUS'].eq('INACTIVATED')))
.groupby('CUSTOMER').agg({'ACTIVATED':'first', 'INACTIVATED':'last'})
.eval('ACTIVATED < INACTIVATED'))
df['ACTIVATED/INACTIVATED'] = df['CUSTOMER'].map(s).astype(int)
print (df)
DATE CUSTOMER STATUS ACTIVATED/INACTIVATED
0 2022-01-01 A ACTIVATED 1
1 2022-01-02 A ACTIVE 1
2 2022-01-03 A INACTIVE 1
3 2022-01-04 A EXCEPTION 1
4 2022-01-05 A INACTIVATED 1
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.