Issue
I have a dataframe like:
days1 = pd.date_range('2020-01-01 01:00:00','2020-01-01 01:19:00',freq='60s')
DF = pd.DataFrame({'Time': days1,
'TimeSeries1': [10, 10, 10, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20],
'TimeSeries2': [11, 12, 13, 12, 11, 14, 15, 16, 21, 20, 20, 23, 15, 15, 15, 15, 15, 15, 15, 15]})
And I would like to get the following:
- For each of the TimeSeries columns (TimeSeries1 and TimeSeries2), I would like to create a correspondent "_Filtered" column, being: TimeSeries1_Filtered[i] = (1-A)* TimeSeries1_Filtered[i-1] + A*TimeSeries1[i]
being "A" a filter factor between 0 and 1.
For each column I need to use a different "A" factor. For example: A1=0.5 for TimeSeries1 and A2=0.8 for TimeSeries1.
I have more than 100 "TimeSeriesN" columns, so it would be good the pass the "A#" parameters in form of a tuple or maybe a list.
Example with A1=0.5
Time TimeSeries1 TimeSeries1_Filtered
0 2020-01-01 01:00:00 10 10
1 2020-01-01 01:01:00 10 10
2 2020-01-01 01:02:00 10 10
3 2020-01-01 01:03:00 20 15
4 2020-01-01 01:04:00 20 17.5
5 2020-01-01 01:05:00 20 18.75
6 2020-01-01 01:06:00 20 19.375
7 2020-01-01 01:07:00 20 19.6875
8 2020-01-01 01:08:00 20 19.84375
9 2020-01-01 01:09:00 20 19.92188
10 2020-01-01 01:10:00 20 19.96094
11 ... ... ...
thanks!
EDIT: correction on the filter notation and equation. Thanks @not_speshal for the heads-up.
Solution
Why not use a time series filtering package such as scipy.signal?
This is how I would do filtering with scipy.signal.lfilter
:
(Thanks @not_speshal for pointing out the mistake in the OP's difference equation)
from scipy.signal import lfilter
coeffs = {'TimeSeries1': 0.5, 'TimeSeries2': 0.8}
for label, a in coeffs.items():
DF[f"{label}_Filtered"] = lfilter([a], [1, a-1], DF[label])
However, it looks as though you are assuming an initial condition based on each filter being at steady-state at time i=0
. This solution produces the results you wanted:
from scipy.signal import lfilter, lfiltic
coeffs = {'TimeSeries1': 0.5, 'TimeSeries2': 0.8}
for label, a in coeffs.items():
y_prev = DF[label].iloc[0] # previous filtered value
zi = lfiltic([a], [1, a-1], [y_prev]) # initial condition
DF[f"{label}_Filtered"] = lfilter([a], [1, a-1], DF[label], zi=zi)[0]
print(DF)
Output:
Time TimeSeries1 TimeSeries2 TimeSeries1_Filtered TimeSeries2_Filtered
0 2020-01-01 01:00:00 10 11 10.000000 11.000000
1 2020-01-01 01:01:00 10 12 10.000000 11.800000
2 2020-01-01 01:02:00 10 13 10.000000 12.760000
3 2020-01-01 01:03:00 20 12 15.000000 12.152000
4 2020-01-01 01:04:00 20 11 17.500000 11.230400
5 2020-01-01 01:05:00 20 14 18.750000 13.446080
...
Answered By - Bill
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.