Sunday, November 19, 2023

[FIXED] Groupby one column, if the date column are the same, fill in the missing values in the numerical column

November 19, 2023 dataframe, pandas, python-3.x No comments

Issue

Assuming for the dataframe df as follows:

          date  actual_value  fitted_value  predicted_value    code
0    2023/8/31           NaN           NaN       520.994413  LX0301
1    2023/9/30           NaN           NaN       580.967973  LX0301
2   2023/10/31           NaN           NaN       650.392867  LX0301
3    2023/8/31    471.459992    520.027310              NaN  LX0301
4    2023/9/30           NaN           NaN       531.199547  LX0301
5   2023/10/31           NaN           NaN       600.053484  LX0301
6    2023/8/31    471.459992    511.902229              NaN  LX0301
7    2023/9/30    480.400211    518.202630              NaN  LX0301
8   2023/10/31           NaN           NaN       537.890792  LX0301
9    2023/8/31           NaN           NaN        99.216818  LX0101
10   2023/9/30           NaN           NaN        98.624779  LX0101
11  2023/10/31           NaN           NaN        98.690596  LX0101
12   2023/8/31     99.400000     99.221767              NaN  LX0101
13   2023/9/30           NaN           NaN        98.822977  LX0101
14  2023/10/31           NaN           NaN        98.875062  LX0101
15   2023/8/31     99.400000     99.220693              NaN  LX0101
16   2023/9/30     98.700000     98.802146              NaN  LX0101
17  2023/10/31           NaN           NaN        98.738091  LX0101

For each group of code (in this example, there are LX0301 and LX0101), if date's values are the same, I need to fill NaNs in actual_value with the actual_value that exists on that date. Taking LX0301 as an example, we see that the actual_value in rows 0, 1 and 4, 2023/8/31 and 2023/9/30 are missing values, but they appear in rows 6 (or 3) and 7, so we need to use the existing actual values in rows 6 (or 3) and 7 to fill in row 0, 1 and 4. Note: For each set of data in this example, the actual values on the same date are the same (for example, the actual values in rows 3 and 6, rows 12 and 15 are the same). How to achieve this? Thanks.

The required result is as follows:

          date  actual_value  fitted_value  predicted_value    code
0    2023/8/31    471.459992           NaN       520.994413  LX0301
1    2023/9/30    480.400211           NaN       580.967973  LX0301
2   2023/10/31           NaN           NaN       650.392867  LX0301
3    2023/8/31    471.459992    520.027310              NaN  LX0301
4    2023/9/30    480.400211           NaN       531.199547  LX0301
5   2023/10/31           NaN           NaN       600.053484  LX0301
6    2023/8/31    471.459992    511.902229              NaN  LX0301
7    2023/9/30    480.400211    518.202630              NaN  LX0301
8   2023/10/31           NaN           NaN       537.890792  LX0301
9    2023/8/31     99.400000           NaN        99.216818  LX0101
10   2023/9/30     98.700000           NaN        98.624779  LX0101
11  2023/10/31           NaN           NaN        98.690596  LX0101
12   2023/8/31     99.400000     99.221767              NaN  LX0101
13   2023/9/30     98.700000           NaN        98.822977  LX0101
14  2023/10/31           NaN           NaN        98.875062  LX0101
15   2023/8/31     99.400000     99.220693              NaN  LX0101
16   2023/9/30     98.700000     98.802146              NaN  LX0101
17  2023/10/31           NaN           NaN        98.738091  LX0101

Solution

Try:

df['actual_value_new'] = df.groupby(['code','date'])['actual_value'].transform('first')

Output (used actual_value_new just to show old and new columns):

          date  actual_value  fitted_value  predicted_value    code  actual_value_new
0    2023/8/31           NaN           NaN       520.994413  LX0301        471.459992
1    2023/9/30           NaN           NaN       580.967973  LX0301        480.400211
2   2023/10/31           NaN           NaN       650.392867  LX0301               NaN
3    2023/8/31    471.459992    520.027310              NaN  LX0301        471.459992
4    2023/9/30           NaN           NaN       531.199547  LX0301        480.400211
5   2023/10/31           NaN           NaN       600.053484  LX0301               NaN
6    2023/8/31    471.459992    511.902229              NaN  LX0301        471.459992
7    2023/9/30    480.400211    518.202630              NaN  LX0301        480.400211
8   2023/10/31           NaN           NaN       537.890792  LX0301               NaN
9    2023/8/31           NaN           NaN        99.216818  LX0101         99.400000
10   2023/9/30           NaN           NaN        98.624779  LX0101         98.700000
11  2023/10/31           NaN           NaN        98.690596  LX0101               NaN
12   2023/8/31     99.400000     99.221767              NaN  LX0101         99.400000
13   2023/9/30           NaN           NaN        98.822977  LX0101         98.700000
14  2023/10/31           NaN           NaN        98.875062  LX0101               NaN
15   2023/8/31     99.400000     99.220693              NaN  LX0101         99.400000
16   2023/9/30     98.700000     98.802146              NaN  LX0101         98.700000
17  2023/10/31           NaN           NaN        98.738091  LX0101               NaN

Answered By - Scott Boston

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 19, 2023

[FIXED] Groupby one column, if the date column are the same, fill in the missing values in the numerical column

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels