Issue
I am trying to find calculate the mean for a new column.
data['english_combined'] = data['english'] + data['intake_english'] + data['language test scores formatted']
so the english_combined column is a the sum of the other columns. I want to take the mean based on what grades are entered, example if only 'English' and 'inktake_english' have a grade I want to take the mean of these 2. if all 3 test are taken I want to take the mean of the 3 tests combined
I did try something like this with no succes
[np.mean(i,j,k) for i,j,k in zip(data['english'], data['intake_english'], data['language test scores formatted'])]
any suggestions that would work?
Solution
df.mean(axis='columns')
does what you want. By default, it ignores NaNs (that is, it won't count them for the total when computing the average).
A simple example:
>>> df = pd.DataFrame({'a': [7, 8.5, pd.NA, 6],
'b': [5, 6, 6, 7],
'c': [7, pd.NA, pd.NA, 5]})
>>> df
a b c
0 7 5 7
1 8.5 6 <NA>
2 <NA> 6 <NA>
3 6 7 5
>>> df.mean(axis='columns')
0 6.333333
1 7.250000
2 6.000000
3 6.000000
dtype: float64
Note how row 2 has 6 as its mean, not 2. Similar for row 1.
For your case, it would be something like
data['english_combined'] = data[
['english', 'intake_english',
'language test scores formatted']].mean(axis='columns')
Answered By - 9769953
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.