Issue
There may be a smarter way to do this in Python Pandas, but the following example should, but doesn't work:
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[1, 0], [1, 2], [2, 0]], columns=['a', 'b'])
df2 = df1.copy()
df3 = df1.copy()
idx = pd.date_range("2010-01-01", freq='H', periods=3)
s = pd.Series([df1, df2, df3], index=idx)
# This causes an error
s.mean()
I won't post the whole traceback, but the main error message is interesting:
TypeError: Could not convert melt T_s
0 6 12
1 0 6
2 6 10 to numeric
It looks like the dataframe was successfully sum'med, but not divided by the length of the series.
However, we can take the sum of the dataframes in the series:
s.sum()
... returns:
a b
0 6 12
1 0 6
2 6 10
Why wouldn't mean()
work when sum()
does? Is this a bug or a missing feature? This does work:
(df1 + df2 + df3)/3.0
... and so does this:
s.sum()/3.0
a b
0 2 4.000000
1 0 2.000000
2 2 3.333333
But this of course is not ideal.
Solution
When you define s
with
s = pd.Series([df1, df2, df3], index=idx)
you get a Series with DataFrames as items:
In [77]: s
Out[77]:
2010-01-01 00:00:00 a b
0 1 0
1 1 2
2 2 0
2010-01-01 01:00:00 a b
0 1 0
1 1 2
2 2 0
2010-01-01 02:00:00 a b
0 1 0
1 1 2
2 2 0
Freq: H, dtype: object
The sum of the items is a DataFrame:
In [78]: s.sum()
Out[78]:
a b
0 3 0
1 3 6
2 6 0
but when you take the mean, nanops.nanmean
is called:
def nanmean(values, axis=None, skipna=True):
values, mask, dtype, dtype_max = _get_values(values, skipna, 0)
the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_max))
...
Notice that _ensure_numeric
(source code) is called on the resultant sum.
An error is raised because a DataFrame is not numeric.
Here is a workaround. Instead of making a Series with DataFrames as items, you can concatenate the DataFrames into a new DataFrame with a hierarchical index:
In [79]: s = pd.concat([df1, df2, df3], keys=idx)
In [80]: s
Out[80]:
a b
2010-01-01 00:00:00 0 1 0
1 1 2
2 2 0
2010-01-01 01:00:00 0 1 0
1 1 2
2 2 0
2010-01-01 02:00:00 0 1 0
1 1 2
2 2 0
Now you can take the sum
and the mean
:
In [82]: s.sum(level=1)
Out[82]:
a b
0 3 0
1 3 6
2 6 0
In [84]: s.mean(level=1)
Out[84]:
a b
0 1 0
1 1 2
2 2 0
Answered By - unutbu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.