Sunday, January 7, 2024

[FIXED] Pandas: create a new column and give each row different values based on the presence of values of a given range over a specific range of columns

January 07, 2024 pandas, python No comments

Issue

I'm new to stackoverflow & also to data analysis with python. Right now I'm stuck at preliminary phases of df cleaning with pandas library.

I'm working on a df like this

  Animal_ID weight  Project   Exp_type   researcher events_d1 events_d2 events_d3 events_d4     
0     a1    50        p1        Acute      alex         0         0         0         4
1     a2    52        p2        chronic    mat          0         1         1         5
2     a3    75        p1        Acute      alex         1                   2
3     a4    53        p2        chronic    mat          0                             0

I would like to insert a column named "responder" and populate each row with y/n attribute according to the presence of at least one event (i.e. at least a single value>0) in the range of columns [events_d1..3 but not in events_d4]. To me it is quite important to select as many events_d* columns as needed without listing them all, but selecting them with something similar to df.filter(like='events_d', axis=1) command. My final goal is that df_output appears something like that:

  Animal_ID weight  responder Project   Exp_type   researcher events_d1 events_d2 events_d3 events_d4     
0     a1    50      n          p1        Acute      alex         0         0         0         4
1     a2    52      y          p2        chronic    mat          0         1         1         5
2     a3    75      y          p1        Acute      alex         1                   2
3     a4    53      n          p2        chronic    mat          0                             0

I tried to solve the issue using query OR concatenating str.contains([1-9]) and filter but it did not work. I was curious about the approach you would use.

Thank you in advance for your help, Pietro

Solution

You can use .filter with regex to exclude 4

m = df.filter(regex="events_d[^4]").any(axis=1)
df['responder'] = np.where(m,'y','n')
df

	Animal_ID	weight	Project	Exp_type	researcher	events_d1	events_d2	events_d3	events_d4	responder
0	a1	50	p1	Acute	alex	0	0	0	4	n
1	a2	52	p2	chronic	mat	0	1	1	5	y
2	a3	75	p1	Acute	alex	1	nan	2	nan	y
3	a4	53	p2	chronic	mat	0	nan	nan	0	n

Answered By - TheMaster

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, January 7, 2024

[FIXED] Pandas: create a new column and give each row different values based on the presence of values of a given range over a specific range of columns

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels