Issue
I have a dataframe with a column named ID, which contains strings like: "10387-14_S91_L001" and "1590-13H-caso14_S76", a total of 20 different IDs which repeats itself along a 107 rows dataframe.
I want to replace the strings that have the substring "caso", to only "caso_00", the number that accompanies must be in crescent order; and the strings that don't contains "caso", must be replaced by "controle_00", also with the crescent order.
I tried to list only the unique() values in the column but can't continue from that
An example:
BEFORE AFTER
0387-14_S91_L001 controle_01
10694-14_S86_L001 controle_02
590-13H-caso14_S76_L001 caso_01
1692-15G-caso20_S74_L001 caso_02
Solution
Unfortunately, a bug prevents me to update my other answer. So here is an alternative.
First create a boolean mask to determine if the row is a "controle" or "caso", then use a groupby.cumcount
to increment your values:
m = df['BEFORE'].str.contains(r'caso\d+_')
df['AFTER'] = (m.replace({True: 'caso_', False: 'controle_'})
+m.groupby(m).cumcount().add(1).astype(str).str.zfill(2)
)
Output:
BEFORE AFTER
0 0387-14_S91_L001 controle_01
1 10694-14_S86_L001 controle_02
2 590-13H-caso14_S76_L001 caso_01
3 1692-15G-caso20_S74_L001 caso_02
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.