Issue
I have three columns, "Date", "Name" and "Value" in my dataframe df.
"Date" and "Name" combinations are unique, i.e., no two rows have the same date-name combination.
I want to create a new column ("Yesterday Value") which is populated by the "Value" for a particular "Name" from the day before the row in question (please assume the dataframe isn't ordered in any way).
As a start, I have managed to create a column "Yesterday Date" using:
from datetime import timedelta
delta = timedelta(days=-1)
df["Date"] = pd.to_datetime(df["Date"])
df["Yesterday Date"] = df(lambda x: x["Date"] + delta, axis=1)
Beyond this, I am pretty much stuck. I have tried using the apply method but haven't been able to figure out how to use it. I suspect the correct solution will require some kind of ordering followed by calling the shift method but I am getting confused because the "Date" column isn't full of just unique values (as it is the date and name combination that is unique). Perhaps some sort of group function is needed here but I am very much out of my depth there.
Just to clarify, for a given row, the solution should read the date and name, calculate yesterday's date and fill in the new column with the value corresponding to the name and yesterday's date.
Solution
Suppose you have a DataFrame, which is ordered by dates for each name:
df = pd.DataFrame({'date':[datetime.date(2022, 10, 10), datetime.date(2022, 10, 11)]*2,
'name':['a','a','b','b'],
'value':[1.,2.,3.,4.]})
date name value
0 2022-10-10 a 1.0
1 2022-10-11 a 2.0
2 2022-10-10 b 3.0
3 2022-10-11 b 4.0
You could group by name and shift:
df['last_value'] = df.groupby(['name'])['value'].shift(1)
date name value last_value
0 2022-10-10 a 1.0 NaN
1 2022-10-11 a 2.0 1.0
2 2022-10-10 b 3.0 NaN
3 2022-10-11 b 4.0 3.0
Answered By - Z Li
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.