Issue
I am using pandas 1.2.3 (quite old, I know but versions are frozen for a certain project). In this project, I am using the following line of code: df.rolling(n).mean(skipna=False)
With pandas 1.5+ this gives the following warning: FutureWarning: Passing additional kwargs to Rolling.mean has no impact on the result and is depracted. This will raise a TypeError in a future version of pandas.
However, I cannot find any recommendation on the web how the old behavior can be replicated 1-1 with newer versions.
Thank you for your help!
Solution
If you are using Pandas 1.2.3, the skipna argument does nothing in the context of a rolling window mean.
This might be good news, because all you have to do is delete skipna=False
, and your code will be compatible with Pandas 1.5. It might also be bad news, if your code is depending on skipna, because that could indicate that your code has a bug if it is expecting skipna to do something. (This is not really a problem for skipna=False
, as that's the default, but if you have skipna=False
, you may also have skipna=True
somewhere else.)
I have three pieces of evidence for skipna doing nothing in the context of a rolling window mean.
Test program
First, the following test program tries to take the mean of a series with a missing value, with skipna set to both True and False. I ran this test program using Pandas 1.2.3.
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1, 2, 3, np.nan, 5, 6, 7]})
print(df['a'].rolling(3).mean(skipna=True))
print(df['a'].rolling(3).mean(skipna=False))
Output:
0 NaN
1 NaN
2 2.0
3 NaN
4 NaN
5 NaN
6 6.0
Name: a, dtype: float64
0 NaN
1 NaN
2 2.0
3 NaN
4 NaN
5 NaN
6 6.0
Name: a, dtype: float64
The outputs for both are the same. If it were skipping NA values while computing the mean, then all of the values in position 2 through 6 should not be NA. Effectively, this means that a rolling mean does not skip NA values whether or not skipna is set.
Documentation
Second, I read what the documentation for Pandas 1.2 says about skipna. Here, we need to be careful that we are reading the right page. What we want is not DataFrame.mean()
, but pandas.core.window.rolling.Rolling.mean()
.
This page does not document a skipna parameter. Presumably, if somebody went through the trouble of writing code to implement skipna, they would have documented it. On the other hand, it does have a catch-all **kwargs parameter, which is unhelpfully documented as "under review."
Might skipna have an effect, just one that does not show up in my test program, and one which is not documented?
Reading the Pandas code
The third approach I used to investigate this was to run the code under a debugger.
By running the test program under a debugger, I found the code within Pandas which is responsible for computing the rolling mean. This code does not implement any kind of skipna functionality.
Here is how rolling mean is implemented.
df.rolling(3).mean()
is called, which callspandas/core/window/rolling.py:Rolling.mean()
is called, which callspandas/core/window/rolling.py:RollingAndExpandingMixin.mean()
is called, which callspandas/core/window/rolling.py:BaseWindow._apply()
is called, which callspandas/_libs/window/aggregations.pyx:roll_mean()
, which computes the rolling mean.
There are two things in this chain that make it impossible for skipna to do anything.
First, skipna is being carried in the kwargs variable up until BaseWindow._apply()
is called. But in that function, the kwargs parameter is unused. Even if roll_mean()
implemented skipna, there is no way for the skipna parameter to get to roll_mean()
.
Second, if you read the source code of roll_mean()
, it does not skip NA values. It doesn't check for them.
In order for skipna to do something, there would need to be code here to implement it, and there isn't.
Conclusion
The skipna parameter to df.rolling(3).mean(skipna=False)
used to do nothing. It still does nothing, but now it complains about it.
Answered By - Nick ODell
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.