Issue
I would like to interpolate a pandas dataframe but I could not find a simple solution to my issue. Here is my pandas df:
df = pandas.DataFrame(numpy.array([numpy.nan, 1, 1, numpy.nan, \
numpy.nan, 1, numpy.nan, numpy.nan, numpy.nan, numpy.nan, 1]),columns=['a'])
a
0 NaN
1 1.0
2 1.0
3 NaN
4 NaN
5 1.0
6 NaN
7 NaN
8 NaN
9 NaN
10 1.0
I want the following result:
a
0 NaN
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 NaN
7 NaN
8 NaN
9 NaN
10 1.0
To do that, I use the interpolate function but it always extrapolate the data:
df2=df.interpolate(limit=2, limit_area='inside' ,method='linear')
a
0 NaN
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 1.0
7 1.0
8 NaN
9 NaN
10 1.0
Is it possible to interpolate ONLY if there is a non NaN value in the range of the limit parameter? (ie: The value for line 6 and 7 should be NaN because line 8 is also a NaN). I have the impression that limit_area is only for the end of the df.
Thanks
Solution
To achieve what you want first create a Series that counts the number of consecutive NaN
values for each group of NaN
and broadcasts that value back to each row in the group. Then interpolate the entire Series and use mask
to NaN
everything that shouldn't have been interpolated.
s = df['a'].notnull()
s = s.ne(s.shift()).cumsum()
m = df.groupby([s, df['a'].isnull()])['a'].transform('size').where(df['a'].isnull())
#0 1.0
#1 NaN
#2 NaN
#3 2.0
#4 2.0
#5 NaN
#6 4.0
#7 4.0
#8 4.0
#9 4.0
#10 NaN
df.interpolate(limit_area='inside', method='linear').mask(m>2)
a
0 NaN
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 NaN
7 NaN
8 NaN
9 NaN
10 1.0
Answered By - ALollz
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.