Issue
Assuming for the dataframe df
as follows:
date actual_value fitted_value predicted_value code
0 2023/8/31 NaN NaN 520.994413 LX0301
1 2023/9/30 NaN NaN 580.967973 LX0301
2 2023/10/31 NaN NaN 650.392867 LX0301
3 2023/8/31 471.459992 520.027310 NaN LX0301
4 2023/9/30 NaN NaN 531.199547 LX0301
5 2023/10/31 NaN NaN 600.053484 LX0301
6 2023/8/31 471.459992 511.902229 NaN LX0301
7 2023/9/30 480.400211 518.202630 NaN LX0301
8 2023/10/31 NaN NaN 537.890792 LX0301
9 2023/8/31 NaN NaN 99.216818 LX0101
10 2023/9/30 NaN NaN 98.624779 LX0101
11 2023/10/31 NaN NaN 98.690596 LX0101
12 2023/8/31 99.400000 99.221767 NaN LX0101
13 2023/9/30 NaN NaN 98.822977 LX0101
14 2023/10/31 NaN NaN 98.875062 LX0101
15 2023/8/31 99.400000 99.220693 NaN LX0101
16 2023/9/30 98.700000 98.802146 NaN LX0101
17 2023/10/31 NaN NaN 98.738091 LX0101
For each group of code
(in this example, there are LX0301
and LX0101
), if date
's values are the same, I need to fill NaNs in actual_value
with the actual_value
that exists on that date. Taking LX0301
as an example, we see that the actual_value
in rows 0
, 1
and 4
, 2023/8/31
and 2023/9/30
are missing values, but they appear in rows 6
(or 3
) and 7
, so we need to use the existing actual values in rows 6
(or 3
) and 7
to fill in row 0
, 1
and 4
. Note: For each set of data in this example, the actual values on the same date are the same (for example, the actual values in rows 3
and 6
, rows 12
and 15
are the same). How to achieve this? Thanks.
The required result is as follows:
date actual_value fitted_value predicted_value code
0 2023/8/31 471.459992 NaN 520.994413 LX0301
1 2023/9/30 480.400211 NaN 580.967973 LX0301
2 2023/10/31 NaN NaN 650.392867 LX0301
3 2023/8/31 471.459992 520.027310 NaN LX0301
4 2023/9/30 480.400211 NaN 531.199547 LX0301
5 2023/10/31 NaN NaN 600.053484 LX0301
6 2023/8/31 471.459992 511.902229 NaN LX0301
7 2023/9/30 480.400211 518.202630 NaN LX0301
8 2023/10/31 NaN NaN 537.890792 LX0301
9 2023/8/31 99.400000 NaN 99.216818 LX0101
10 2023/9/30 98.700000 NaN 98.624779 LX0101
11 2023/10/31 NaN NaN 98.690596 LX0101
12 2023/8/31 99.400000 99.221767 NaN LX0101
13 2023/9/30 98.700000 NaN 98.822977 LX0101
14 2023/10/31 NaN NaN 98.875062 LX0101
15 2023/8/31 99.400000 99.220693 NaN LX0101
16 2023/9/30 98.700000 98.802146 NaN LX0101
17 2023/10/31 NaN NaN 98.738091 LX0101
Solution
Try:
df['actual_value_new'] = df.groupby(['code','date'])['actual_value'].transform('first')
Output (used actual_value_new
just to show old and new columns):
date actual_value fitted_value predicted_value code actual_value_new
0 2023/8/31 NaN NaN 520.994413 LX0301 471.459992
1 2023/9/30 NaN NaN 580.967973 LX0301 480.400211
2 2023/10/31 NaN NaN 650.392867 LX0301 NaN
3 2023/8/31 471.459992 520.027310 NaN LX0301 471.459992
4 2023/9/30 NaN NaN 531.199547 LX0301 480.400211
5 2023/10/31 NaN NaN 600.053484 LX0301 NaN
6 2023/8/31 471.459992 511.902229 NaN LX0301 471.459992
7 2023/9/30 480.400211 518.202630 NaN LX0301 480.400211
8 2023/10/31 NaN NaN 537.890792 LX0301 NaN
9 2023/8/31 NaN NaN 99.216818 LX0101 99.400000
10 2023/9/30 NaN NaN 98.624779 LX0101 98.700000
11 2023/10/31 NaN NaN 98.690596 LX0101 NaN
12 2023/8/31 99.400000 99.221767 NaN LX0101 99.400000
13 2023/9/30 NaN NaN 98.822977 LX0101 98.700000
14 2023/10/31 NaN NaN 98.875062 LX0101 NaN
15 2023/8/31 99.400000 99.220693 NaN LX0101 99.400000
16 2023/9/30 98.700000 98.802146 NaN LX0101 98.700000
17 2023/10/31 NaN NaN 98.738091 LX0101 NaN
Answered By - Scott Boston
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.