Thursday, May 12, 2022

[FIXED] interpolation of missing values, not NA

May 12, 2022 pandas, python No comments

Issue

i want to interpolate (Linear interpolation) data. but There is no NA.

Here is my data.with many missing values.

timestamp	id	strength
1383260400000	1	-0.3803901328171995
1383261000000	1	-0.42196042219455937
1383265200000	1	-0.460714706261982

My expected :

timestamp	id	strength
1383260400000	1	-0.3803901328171995
1383261000000	1	-0.42196042219455937
1383261600000	1	Linear interpolated data
1383262200000	1	Linear interpolated data
1383262800000	1	Linear interpolated data
1383263400000	1	Linear interpolated data
1383264000000	1	Linear interpolated data
1383264600000	1	Linear interpolated data
1383265200000	1	-0.460714706261982

timestamp starts 1383260400000, ends 1383343800000 and another id(from 1 to 2025) has same issues.

Solution

Idea is create datetimes, convert to DatetimeIndex and in lambda function add missing datetimes by Series.asfreq with interpolate:

df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')

f = lambda x: x.asfreq('10Min').interpolate()
df = df.set_index('timestamp').groupby('id')['strength'].apply(f).reset_index()
print (df)
   id           timestamp  strength
0   1 2013-10-31 23:00:00 -0.380390
1   1 2013-10-31 23:10:00 -0.421960
2   1 2013-10-31 23:20:00 -0.427497
3   1 2013-10-31 23:30:00 -0.433033
4   1 2013-10-31 23:40:00 -0.438569
5   1 2013-10-31 23:50:00 -0.444106
6   1 2013-11-01 00:00:00 -0.449642
7   1 2013-11-01 00:10:00 -0.455178
8   1 2013-11-01 00:20:00 -0.460715

Last if need original format of timestamps:

df['timestamp'] = df['timestamp'].astype(np.int64) // 1000000

print (df)
   id      timestamp  strength
0   1  1383260400000 -0.380390
1   1  1383261000000 -0.421960
2   1  1383261600000 -0.427497
3   1  1383262200000 -0.433033
4   1  1383262800000 -0.438569
5   1  1383263400000 -0.444106
6   1  1383264000000 -0.449642
7   1  1383264600000 -0.455178
8   1  1383265200000 -0.460715

EDIT:

#data from question
df =pd.DataFrame({'timestamp': [1383260400000, 1383261000000, 1383265200000], 
                  'id': [1, 1, 1], 
                  'strength':[-0.3803901328171995,-0.4219604221945593,-0.460714706261982]})
    
print (df)
       timestamp  id  strength
0  1383260400000   1 -0.380390
1  1383261000000   1 -0.421960
2  1383265200000   1 -0.460715

Solution create for each id all datetimes by date_range and create missing values by DataFrame.reindex with MultiIndex, last per id is used interpolate:

df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')

r = pd.date_range(pd.to_datetime(1383260400000, unit='ms') , 
                  pd.to_datetime(1383343800000, unit='ms'), 
                  freq='10Min')

ids = df['id'].unique()

mux = pd.MultiIndex.from_product([r, ids], names=['timestamp','id'])
f = lambda x: x.interpolate()
df = (df.set_index(['timestamp', 'id'])
        .reindex(mux)
        .groupby('id')['strength']
        .transform(f)
        .reset_index())

print (df)
              timestamp  id  strength
0   2013-10-31 23:00:00   1 -0.380390
1   2013-10-31 23:10:00   1 -0.421960
2   2013-10-31 23:20:00   1 -0.427497
3   2013-10-31 23:30:00   1 -0.433033
4   2013-10-31 23:40:00   1 -0.438569
..                  ...  ..       ...
135 2013-11-01 21:30:00   1 -0.460715
136 2013-11-01 21:40:00   1 -0.460715
137 2013-11-01 21:50:00   1 -0.460715
138 2013-11-01 22:00:00   1 -0.460715
139 2013-11-01 22:10:00   1 -0.460715

[140 rows x 3 columns]

Answered By - jezrael

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, May 12, 2022

[FIXED] interpolation of missing values, not NA

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels