Issue
Is it possible to fill dataframe from another dataframe data based on specified condition DATE RANGE? Many Thanks
df1 = (condition DATE RANGE)
company start date end date
0 a 2023-01-02 2023-01-06
1 b 2023-01-05 2023-01-12
2 c 2023-01-04 2023-01-13
3 d 2023-01-03 2023-01-10
df2 = data
DATE a b c d
0 2023-01-02 1 10 30 40
1 2023-01-03 2 11 31 41
2 2023-01-04 3 12 32 42
3 2023-01-05 4 13 33 43
4 2023-01-06 5 14 34 44
5 2023-01-09 6 15 35 45
6 2023-01-10 7 16 36 46
7 2023-01-11 8 17 37 47
8 2023-01-12 9 18 38 48
9 2023-01-13 10 19 39 49
df3 = desired output
DATE a b c d
0 2023-01-02 1.0 NaN NaN NaN
1 2023-01-03 2.0 NaN NaN 41.0
2 2023-01-04 3.0 NaN 32.0 42.0
3 2023-01-05 4.0 13.0 33.0 43.0
4 2023-01-06 5.0 14.0 34.0 44.0
5 2023-01-09 NaN 15.0 35.0 45.0
6 2023-01-10 NaN 16.0 36.0 46.0
7 2023-01-11 NaN 17.0 37.0 NaN
8 2023-01-12 NaN 18.0 38.0 NaN
9 2023-01-13 NaN NaN 39.0 NaN
Solution
thanks to @pandaskim for providing a workable data. one option is to get the positions where df1 matches df2; we already know that the company
column is in the same order as the columns in df2 (a,b,c,d
), we take advantage of that to create the new dataframe, with some help from pyjanitor's conditional_join and numpy:
# pip install pyjanitor
import pandas as pd
import numpy as np
import janitor
df1['start date'] = pd.to_datetime(df1['start date'])
df1['end date'] = pd.to_datetime(df1['end date'])
df2['DATE'] = pd.to_datetime(df2['DATE'])
# get positions where df1 matches df2, based on the dates
# essentially what are the positions where start date <= DATE <= end date
outcome=(df2
.assign(pos=df2.index)
.conditional_join(
df1.assign(posn=df1.index),
('DATE','start date','>='),
('DATE','end date','<='),
df_columns='pos',
right_columns='posn')
)
left = outcome.pos
right=outcome.posn
shape=df2.drop(columns='DATE').shape
repl=df2.drop(columns='DATE').to_numpy()
# fancy indexing with numpy
arr=np.empty(shape,dtype=float)
arr[:]=np.nan
arr[left,right]=repl[left,right]
#build new dataframe
df2.assign(**pd.DataFrame(arr, columns=['a','b','c','d']))
DATE a b c d
0 2023-01-02 1.0 NaN NaN NaN
1 2023-01-03 2.0 NaN NaN 41.0
2 2023-01-04 3.0 NaN 32.0 42.0
3 2023-01-05 4.0 13.0 33.0 43.0
4 2023-01-06 5.0 14.0 34.0 44.0
5 2023-01-09 NaN 15.0 35.0 45.0
6 2023-01-10 NaN 16.0 36.0 46.0
7 2023-01-11 NaN 17.0 37.0 NaN
8 2023-01-12 NaN 18.0 38.0 NaN
9 2023-01-13 NaN NaN 39.0 NaN
Answered By - sammywemmy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.