Issue
Apparently I have the opposite of everyone else's problem... I would like to take the mean of a pandas dataframe, and I would like to have the result return NaN if there are ANY NaNs in the frame. However, it seems like neither np.mean nor np.nanmean do this. Example code:
b = pd.DataFrame([[1,2],[math.nan,4]])
print(b)
print(np.mean(b))
print(np.nanmean(b))
Result:
Expected Result:
Solution
It can be done using pandas DataFrame.mean()
function :
b.mean(axis=None,skipna=False)
According to the documentation of numpy, the reason that you don't get nan
from mean
function is that when numpy sees the input of mean()
is not multiarray.ndarray, it tries to use mean
function from that data type if possible.
if type(a) is not mu.ndarray:
try:
mean = a.mean
except AttributeError:
pass
else:
return mean(axis=axis, dtype=dtype, out=out, **kwargs)
In your case it is like calling the mean()
function of pandas
with axis=None
. Which is like running the following code:
b.mean(axis=None) # 2.333333
If you are insisting on using numpy.mean()
to calculate mean, you can convert your DataFrame to numpy array before callin np.mean()
.
array = b.to_numpy()
np.mean(array) # nan
Answered By - Mohsen_Fatemi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.