Issue
In my data set, I have students data with some attempts - First,Second,Third,Four and Five, I have to find out each students for how many they taken attempt for taking >=50 marks, If one person complete in first attempt, then my output showing - "Column title" and sum of marks.
Here is my dummy data set
Person First Second Third Four Five
0 A 21 21 15 14 0
1 B 12 15 19 5 19
2 C 50 0 18 7 0
3 D 50 6 0 15 35
4 E 0 25 25 10 24
Output:
I am trying to find out but I am stuck in getting the correct solution.
Here is my work
import pandas as pd
import numpy as np
df
df['Total'] = df.sum(axis=1)
df['Output'] = df.apply(lambda row: (row['First']+row['Second']+row['Third'])+ row['Four']+row['Five'] if row['A'] > 50)
Output = np.array(df[['First','Second','Third','Four','Five']].values.tolist())
df[['First','Second','Third','Four','Five']] = np.where(Output > 50, 50, Output).tolist()
df.groupby('Person')[['First','Second','Third','Four','Five']>50].sum()
df['Output'] = np.where(df.sum(axis=1)) >50
Please help.
Solution
Try this:
cmax = df.select_dtypes(include="number").cumsum(axis=1)
df['Output'] = cmax.ge(50).idxmax(axis=1)
df['Sum'] = cmax.stack().loc[zip(df.index, df['Output'])].to_numpy()
df
Output:
Person First Second Third Four Five Output Sum
0 A 21 21 15 14 0 Third 57
1 B 12 15 19 5 19 Four 51
2 C 50 0 18 7 0 First 50
3 D 50 6 0 15 35 First 50
4 E 0 25 25 10 24 Third 50
Details:
Let's use select_dtypes to only get those attempts columns with numbers, then do a cumulative sum of the rows with axis=1.
Then, use that dataframe and find all values equal or greater than 50 to create a boolean dataframe. Using idxmax along the row, we can find the first True value, ie the first time the cumulative sum reaches 50. And, idxmax will return that column headers.
Lastly, we can use stack the cumulative sum dataframe and use loc to get values from cumulative sum that matches, the index with the output column.
Answered By - Scott Boston
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.