Issue
I'm trying to yield a stats for several some kind of "bins". Namely, how many students getting grade 0, how many students getting grade that is greater than 0 and less than 60 ...
I'm not sure if they are bins as they are not equally segmented.
grade == 0
0 < grade < 60
60 <= grade < 70
...
Here is the code
grade_list = [87.5, 87.5, 65.0, 90.0, 72.5, 65.0, 0.0, 65.0, 72.5, 65.0, 72.5, 65.0, 90.0, 90.0, 87.5, 87.5, 87.5, 65.0, 87.5, 65.0, 65.0, 90.0, 99.0, 65.0, 87.5, 65.0, 87.5, 90.0, 87.5, 90.0, 90.0, 0.0, 90.0, 99.0, 65.0, 87.5, 72.5, 72.5, 90.0, 0.0, 65.0, 72.5, 90.0, 90.0, 65.0, 90.0, 90.0, 65.0, 65.0, 0.0, 90.0, 90.0, 100.0, 99.0, 65.0, 90.0, 90.0, 0.0, 99.0, 90.0, 100.0, 87.5, 65.0, 99.0, 0.0, 90.0, 65.0, 90.0, 65.0, 99.0, 90.0, 65.0, 100.0, 65.0, 90.0, 99.0]
print(len(df[df['grade']==0]))
print(len(df[(df['grade']>0)&(df['grade']<60)]))
print(len(df[(df['grade']>=60)&(df['grade']<70)]))
print(len(df[(df['grade']>=70)&(df['grade']<80)]))
print(len(df[(df['grade']>=80)&(df['grade']<90)]))
print(len(df[(df['grade']>=90)]))
I got what I want. The code seems ugly though. Is there a better way to do the job?
Solution
Try this
df['category'] = (df['grade']/10).astype(int)
#This bit converts categories between 0 and 6 into 1. So the categories you now have are 0, 1, 6, 7.., 10
df['category'] = np.where((df.category > 0) & (df.category < 6), 1, df.category)
for i in range(max(df.category)+1):
if len(df[df['category']==i]) > 0:
print(i, len(df[df['category']==i]))
This will give you categories like the values you want and print out the number of rows in those categories. The if statement is just to avoid blank rows like you did in your snippet, but can remove it.
Output-
The dataframe-
grade category
0 87.5 8
1 87.5 8
2 65.0 6
3 90.0 9
4 72.5 7
.. ... ...
71 65.0 6
72 100.0 10
73 65.0 6
74 90.0 9
75 99.0 9
Sizes of each bin-
0 6
6 21
7 6
8 11
9 29
10 3
Answered By - Kartikey Mehrotra
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.