Issue
I have a dataset where, whenever a date value is input, specific date columns will shift.
Data
location type mig1 de mig2 re
ny aa 8/1/2021 10/1/2021 1/1/2022 2/1/2022
ny aa 8/1/2021 10/1/2021 1/1/2022 2/1/2022
ca aa 8/1/2021 10/1/2021 1/1/2022 2/1/2022
tx bb 9/1/2021 11/1/2021 2/1/2022 3/1/2022
Desired
#Date 8/1/2022 is input, which yields:
location type mig1 de mig2 re
ny aa 03/1/2022 05/1/2022 8/1/2022 7/1/2022
ny aa 03/1/2022 05/1/2022 8/1/2022 7/1/2022
ca aa 03/1/2022 05/1/2022 8/1/2022 7/1/2022
tx bb 03/1/2021 11/1/2021 08/1/2022 3/1/2022
Input prompt will ask user which date value they wish to input. User inputs the date '8/1/2021', which updates the remaining column date values. based on rules above.
Doing
datevalue = pd.to_datetime(input("Enter shift: "))
if(df[type] == 'aa'):
d = {
'mig1': pd.DateOffset(months=5),
'de': pd.DateOffset(months=3),
're': pd.DateOffset(months=1),
}
s = pd.Series(d).rsub(datevalue)
df.assign(**{**s, 'mig2': datevalue})
else:
e = {
'mig1': pd.DateOffset(months=5),
}
s1 = pd.Series(e).rsub(datevalue)
df.assign(**{**s1, 'mig2': datevalue})
Logic:
Date value in the mig2
column is entered into the input() prompt, mig1, de and re values shift according to these rules:
mig1 is 5 months from the date entered in the input() prompt
de is 3 months from the date entered in the input() prompt
re is 1 month from the date entered in the input() prompt
Where [type] column values == 'aa'
The only data that is changing are the dates, which are essentially getting shifted based upon the user input.
Any suggestion is appreciated.
Solution
IIUC:
input = datetime(2022, 8, 1)
conditions = {'mig1':5, 'de':3, 're':1}
def apply_this(x):
if x.name == 'mig2':
return [input]*len(x)
else:
return [input - pd.DateOffset(months=conditions[x.name])]*len(x)
date_cols =['mig1', 'de', 're', 'mig2']
df.loc[df['type'] == 'aa', date_cols] = df[date_cols].apply(lambda x: apply_this(x))
print(df)
OUTPUT
location type mig1 de mig2 re
0 ny aa 2022-03-01 2022-05-01 2022-08-01 2022-07-01
1 ny aa 2022-03-01 2022-05-01 2022-08-01 2022-07-01
2 ca aa 2022-03-01 2022-05-01 2022-08-01 2022-07-01
3 tx bb 2021-09-01 2021-11-01 2022-02-01 2022-03-01
SETUP
data = {'location': ['ny', 'ny', 'ca', 'tx'],
'type': ['aa', 'aa', 'aa', 'bb'],
'mig1': ['2021-08-01 00:00:00', '2021-08-01 00:00:00', '2021-08-01 00:00:00', '2021-09-01 00:00:00'],
'de': ['2021-10-01 00:00:00', '2021-10-01 00:00:00', '2021-10-01 00:00:00', '2021-11-01 00:00:00'],
'mig2': ['2022-01-01 00:00:00', '2022-01-01 00:00:00', '2022-01-01 00:00:00', '2022-02-01 00:00:00'],
're': ['2022-02-01 00:00:00', '2022-02-01 00:00:00', '2022-02-01 00:00:00', '2022-03-01 00:00:00']}
df = pd.DataFrame(data)
for col in ['mig1', 'de', 'mig2', 're']:
df[col] = pd.to_datetime(df[col])
EDIT according to discussion in comments:
conditions = {'de':3, 're':1}
def apply_this(x):
return [input - pd.DateOffset(months=conditions[x.name])]*len(x)
date_cols =['de', 're']
df.loc[df['type'] == 'aa', date_cols] = df[date_cols].apply(lambda x: apply_this(x))
df['mig1'] = input - pd.DateOffset(months=5)
df['mig2'] = input
Answered By - Muhammad Hassan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.