Wednesday, December 20, 2023

[FIXED] pandas - additional kwargs in rolling mean deprecated

December 20, 2023 deprecation-warning, pandas, python No comments

Issue

I am using pandas 1.2.3 (quite old, I know but versions are frozen for a certain project). In this project, I am using the following line of code: df.rolling(n).mean(skipna=False)

With pandas 1.5+ this gives the following warning: FutureWarning: Passing additional kwargs to Rolling.mean has no impact on the result and is depracted. This will raise a TypeError in a future version of pandas.

However, I cannot find any recommendation on the web how the old behavior can be replicated 1-1 with newer versions.

Thank you for your help!

Solution

If you are using Pandas 1.2.3, the skipna argument does nothing in the context of a rolling window mean.

This might be good news, because all you have to do is delete skipna=False, and your code will be compatible with Pandas 1.5. It might also be bad news, if your code is depending on skipna, because that could indicate that your code has a bug if it is expecting skipna to do something. (This is not really a problem for skipna=False, as that's the default, but if you have skipna=False, you may also have skipna=True somewhere else.)

I have three pieces of evidence for skipna doing nothing in the context of a rolling window mean.

Test program

First, the following test program tries to take the mean of a series with a missing value, with skipna set to both True and False. I ran this test program using Pandas 1.2.3.

import pandas as pd
import numpy as np


df = pd.DataFrame({'a': [1, 2, 3, np.nan, 5, 6, 7]})
print(df['a'].rolling(3).mean(skipna=True))
print(df['a'].rolling(3).mean(skipna=False))

Output:

0    NaN
1    NaN
2    2.0
3    NaN
4    NaN
5    NaN
6    6.0
Name: a, dtype: float64
0    NaN
1    NaN
2    2.0
3    NaN
4    NaN
5    NaN
6    6.0
Name: a, dtype: float64

The outputs for both are the same. If it were skipping NA values while computing the mean, then all of the values in position 2 through 6 should not be NA. Effectively, this means that a rolling mean does not skip NA values whether or not skipna is set.

Documentation

Second, I read what the documentation for Pandas 1.2 says about skipna. Here, we need to be careful that we are reading the right page. What we want is not DataFrame.mean(), but pandas.core.window.rolling.Rolling.mean().

This page does not document a skipna parameter. Presumably, if somebody went through the trouble of writing code to implement skipna, they would have documented it. On the other hand, it does have a catch-all **kwargs parameter, which is unhelpfully documented as "under review."

Might skipna have an effect, just one that does not show up in my test program, and one which is not documented?

Reading the Pandas code

The third approach I used to investigate this was to run the code under a debugger.

By running the test program under a debugger, I found the code within Pandas which is responsible for computing the rolling mean. This code does not implement any kind of skipna functionality.

Here is how rolling mean is implemented.

df.rolling(3).mean() is called, which calls
pandas/core/window/rolling.py:Rolling.mean() is called, which calls
pandas/core/window/rolling.py:RollingAndExpandingMixin.mean() is called, which calls
pandas/core/window/rolling.py:BaseWindow._apply() is called, which calls
pandas/_libs/window/aggregations.pyx:roll_mean(), which computes the rolling mean.

There are two things in this chain that make it impossible for skipna to do anything.

First, skipna is being carried in the kwargs variable up until BaseWindow._apply() is called. But in that function, the kwargs parameter is unused. Even if roll_mean() implemented skipna, there is no way for the skipna parameter to get to roll_mean().

Second, if you read the source code of roll_mean(), it does not skip NA values. It doesn't check for them.

In order for skipna to do something, there would need to be code here to implement it, and there isn't.

Conclusion

The skipna parameter to df.rolling(3).mean(skipna=False) used to do nothing. It still does nothing, but now it complains about it.

Answered By - Nick ODell

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 20, 2023

[FIXED] pandas - additional kwargs in rolling mean deprecated

Issue

Solution

Test program

Documentation

Reading the Pandas code

Conclusion

0 comments:

Post a Comment

Popular Posts

Labels