Issue
I have a dataframe and it just has data for weekday. Below is sample dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({'BAS_DT': ['2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-05', '2023-01-05', '2023-01-06', '2023-01-07'],
'CUS_NO': [np.nan, np.nan, '900816636', '900816636', '900816946', '900816931', np.nan, np.nan],
'VALUE': [np.nan, np.nan, 10, 10, 7, 8, np.nan, np.nan],
'BR': [np.nan, np.nan, 100, 100, 200, 300, np.nan, np.nan]})
df
BAS_DT CUS_NO VALUE BR
0 2023-01-02 NaN NaN NaN
1 2023-01-03 NaN NaN NaN
2 2023-01-04 900816636 10.0 100.0
3 2023-01-05 900816636 10.0 100.0
4 2023-01-05 900816946 7.0 200.0
5 2023-01-05 900816931 8.0 300.0
6 2023-01-06 NaN NaN NaN
7 2023-01-07 NaN NaN NaN
I want to fill 2023-01-06
and 2023-01-07
same with 2023-01-05
. I tried ffill
but it just fill with the first row that closest to NaN row. Below is my desired Output:
BAS_DT CUS_NO VALUE BR
0 2023-01-02 NaN NaN NaN
1 2023-01-03 NaN NaN NaN
2 2023-01-04 900816636 10.0 100.0
3 2023-01-05 900816636 10.0 100.0
4 2023-01-05 900816946 7.0 200.0
5 2023-01-05 900816931 8.0 300.0
7 2023-01-06 900816636 10.0 100.0
8 2023-01-06 900816946 7.0 200.0
9 2023-01-06 900816931 8.0 300.0
10 2023-01-07 900816636 10.0 100.0
11 2023-01-07 900816946 7.0 200.0
12 2023-01-07 900816931 8.0 300.0
Thank you.
Solution
You could group by the BAS_DT
, aggregating CUS_NO
into a list where there is more than one value in the group, then ffill
and explode
:
out = (df
.groupby('BAS_DT')
.agg(lambda g:list(g) if len(g) > 1 else g)
.ffill()
.explode('CUS_NO')
.reset_index()
)
Output:
BAS_DT CUS_NO
0 2023-01-02 NaN
1 2023-01-03 NaN
2 2023-01-04 900816636
3 2023-01-05 900816636
4 2023-01-05 900816946
5 2023-01-05 900816931
6 2023-01-06 900816636
7 2023-01-06 900816946
8 2023-01-06 900816931
9 2023-01-07 900816636
10 2023-01-07 900816946
11 2023-01-07 900816931
Note that for the update to the question, you will need to use
.explode(['CUS_NO', 'VALUE', 'BR'])
but otherwise the code will work unchanged as long as you are using pandas 2.0 or later.
Answered By - Nick
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.