Issue
I am working with time series data (non-stationary), I have applied .diff(periods=n)
for differencing the data to eliminate trends and seasonality factors from data.
By using .diff(periods=n)
, the observation from the previous time step (t-1
) is subtracted from the current observation (t
).
Now I want to invert back the differenced data to its original scale, but I am having issues with that. You can find the code here.
My code for differencing:
data_diff = df.diff(periods=1)
data_diff.head(5)
My code for inverting the differenced data back to its original scale:
cols = df.columns
x = []
for col in cols:
diff_results = df[col] + data_diff[col].shift(-1)
x.append(diff_results)
diff_df_inverted = pd.concat(x, axis=1)
diff_df_inverted
As you can see from last output in the code, I have successfully inverted my data back to its original scale. However, I do not get the inverted data for row 1. It inverts and shifts the values up a row. My question is, why? What am I missing?
thank you!
Solution
In this line:
diff_results = df[col] + data_diff[col].shift(-1)
data_diff
starts from the second row and that is the reason it appears as it could be shifted up.
The reason for this is because you use .shift(-1)
.
An easy solution would be using df.cumsum() as it is the exact opposite of df.diff()
.
The only thing you have to do is get the first row to replace the NaN
values from your data_diff
dataframe. You need to do this because it is the original row that every other row would be added to. After that, you call data_diff.cumsum()
and now you have the original data.
Here is the detailed code.
data_diff.iloc[0]=df.iloc[0]
a = data_diff.cumsum()
Answered By - H. pap
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.