Issue
import pandas as pd
columns = ['S1', 'S2', 'S3', 'S4', 'S5']
df = pd.DataFrame({'Patient':['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p8', 'p10'],
'S1':[0.7, 0.3, 0.5, 0.8, 0.9, 0.1, 0.9, 0.2, 0.6, 0.3],
'S2':[0.2, 0.3, 0.5, 0.4, 0.9, 0.1, 0.9, 0.7, 0.4, 0.3],
'S3':[0.6, 0.3, 0.5, 0.8, 0.9, 0.8, 0.9, 0.3, 0.6, 0.3],
'S4':[0.2, 0.3, 0.7, 0.8, 0.9, 0.1, 0.9, 0.7, 0.3, 0.3 ],
'S5':[0.9, 0.8, 0.5, 0.8, 0.9, 0.7, 0.2, 0.7, 0.6, 0.3 ]})
# vectorized operations in data frame
# get the number of the cells that are >=0.5 for each column
arr1 = df[columns].ge(0.5).sum().to_numpy()
# get the sum the cells that are >=0.5 for each column
arr2 = df[df[columns]>=0.5][columns].sum().to_numpy()
print(arr1)
print(arr2)
How do I get the list of patients or a set of patients for each column in the df like below?
[('p1', 'p3', 'p4', 'p5', 'p7', 'p9'),
('p3', 'p5', 'p7', 'p8'),
('p1', 'p3', 'p4', 'p5', 'p6', 'p7', 'p9'),
(...),
(...)]
Solution
The result is not tabular format. You can just use a list comprehension in this case:
[df.Patient[df[col] >= 0.5].to_list() for col in columns]
#[['p1', 'p3', 'p4', 'p5', 'p7', 'p8'],
# ['p3', 'p5', 'p7', 'p8'],
# ['p1', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8'],
# ['p3', 'p4', 'p5', 'p7', 'p8'],
# ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p8', 'p8']]
Answered By - Psidom
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.