Issue
This is a mini version of the result of my time series prediction of multiple indicators. For the sub-dataset corresponding to each indicator_name
, I hope to take the last 3 periods for the actual_value
part (fitted_value
is a fit to the actual_value
, so they are of equal length), and the next 2 periods for the predicted_value
part. How to achieve this?
indicator_name date actual_value fitted_value predicted_value
0 CPI 2023-07-31 -7.7 -9.280851 NaN
1 CPI 2023-08-31 72.5 5.181248 NaN
2 CPI 2023-09-30 126.3 75.801026 NaN
3 CPI 2023-10-31 147.9 123.940712 NaN
4 CPI 2023-11-30 NaN NaN 113.341466
5 CPI 2023-12-31 NaN NaN 103.223588
6 CPI 2024-01-31 NaN NaN 117.865400
7 PPI 2022-09-30 25.8 12.700000 NaN
8 PPI 2022-12-31 6.0 12.800000 NaN
9 PPI 2023-03-31 6.1 8.800000 NaN
10 PPI 2023-06-30 7.9 9.700000 NaN
11 PPI 2023-09-30 8.1 11.000000 NaN
12 PPI 2023-12-31 NaN NaN 11.500000
13 PPI 2024-03-31 NaN NaN 13.700000
14 PPI 2024-06-30 NaN NaN 12.800000
15 PPI 2024-09-30 NaN NaN 15.300000
The expected results are as follows:
indicator_name date actual_value fitted_value predicted_value
0 CPI 2023-08-31 72.5 5.181248 NaN
1 CPI 2023-09-30 126.3 75.801026 NaN
2 CPI 2023-10-31 147.9 123.940712 NaN
3 CPI 2023-11-30 NaN NaN 113.341466
4 CPI 2023-12-31 NaN NaN 103.223588
5 PPI 2023-03-31 6.1 8.800000 NaN
6 PPI 2023-06-30 7.9 9.700000 NaN
7 PPI 2023-09-30 8.1 11.000000 NaN
8 PPI 2023-12-31 NaN NaN 11.500000
9 PPI 2024-03-31 NaN NaN 13.700000
Code:
def slice_act_dataframe(df, by_cols, n: int=3):
df = df.sort_values('date')
df_sliced = df.groupby(by_cols).tail(n)
return df_sliced
def slice_pred_dataframe(df, by_cols, n: int=2):
df = df.sort_values('date')
df_sliced = df.groupby(by_cols).head(n)
return df_sliced
Solution
One possible solution:
def slice_dataframe(df, by_cols, n_act: int = 3, n_pred: int = 2):
df = df.sort_values('date')
# Slice actual values
df_act = df[df['actual_value'].notna()]
df_act_sliced = df_act.groupby(by_cols).tail(n_act)
# Slice predicted values
df_pred = df[df['predicted_value'].notna()]
df_pred_sliced = df_pred.groupby(by_cols).head(n_pred)
# Concatenate sliced dataframes
df_sliced = pd.concat([df_act_sliced, df_pred_sliced])
return df_sliced
df_sliced = slice_dataframe(df, by_cols=['indicator_name']).sort_values(['indicator_name', 'date'])
print(df_sliced)
Out:
indicator_name date actual_value fitted_value predicted_value
1 CPI 2023-08-31 72.5 5.181248 NaN
2 CPI 2023-09-30 126.3 75.801026 NaN
3 CPI 2023-10-31 147.9 123.940712 NaN
4 CPI 2023-11-30 NaN NaN 113.341466
5 CPI 2023-12-31 NaN NaN 103.223588
9 PPI 2023-03-31 6.1 8.800000 NaN
10 PPI 2023-06-30 7.9 9.700000 NaN
11 PPI 2023-09-30 8.1 11.000000 NaN
12 PPI 2023-12-31 NaN NaN 11.500000
13 PPI 2024-03-31 NaN NaN 13.700000
Answered By - ah bon
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.