Issue
I'm new to stackoverflow & also to data analysis with python. Right now I'm stuck at preliminary phases of df cleaning with pandas library.
I'm working on a df like this
Animal_ID weight Project Exp_type researcher events_d1 events_d2 events_d3 events_d4
0 a1 50 p1 Acute alex 0 0 0 4
1 a2 52 p2 chronic mat 0 1 1 5
2 a3 75 p1 Acute alex 1 2
3 a4 53 p2 chronic mat 0 0
I would like to insert a column named "responder" and populate each row with y/n attribute according to the presence of at least one event (i.e. at least a single value>0) in the range of columns [events_d1..3 but not in events_d4]. To me it is quite important to select as many events_d* columns as needed without listing them all, but selecting them with something similar to df.filter(like='events_d', axis=1) command. My final goal is that df_output appears something like that:
Animal_ID weight responder Project Exp_type researcher events_d1 events_d2 events_d3 events_d4
0 a1 50 n p1 Acute alex 0 0 0 4
1 a2 52 y p2 chronic mat 0 1 1 5
2 a3 75 y p1 Acute alex 1 2
3 a4 53 n p2 chronic mat 0 0
I tried to solve the issue using query OR concatenating str.contains([1-9]) and filter but it did not work. I was curious about the approach you would use.
Thank you in advance for your help, Pietro
Solution
You can use .filter
with regex to exclude 4
m = df.filter(regex="events_d[^4]").any(axis=1)
df['responder'] = np.where(m,'y','n')
df
Animal_ID | weight | Project | Exp_type | researcher | events_d1 | events_d2 | events_d3 | events_d4 | responder | |
---|---|---|---|---|---|---|---|---|---|---|
0 | a1 | 50 | p1 | Acute | alex | 0 | 0 | 0 | 4 | n |
1 | a2 | 52 | p2 | chronic | mat | 0 | 1 | 1 | 5 | y |
2 | a3 | 75 | p1 | Acute | alex | 1 | nan | 2 | nan | y |
3 | a4 | 53 | p2 | chronic | mat | 0 | nan | nan | 0 | n |
Answered By - TheMaster
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.