Issue
I am trying to group by on a timeseries dataset so that I can find most frequent day of week, week of month etc.
My dataset looks something like this:
ID Date
1 2020-01-02
1 2020-01-09
1 2020-01-08
My output dataset should look something like this:
ID Pref_Day_Of_Week_A Pref_Week_Of_Month_A
1 4 2
(Here Thursday is the mode day of week, and the 2nd week is mode week of month for given dates) So essentially trying to find out the mode(most frequent) day of week and mode(most frequent) week of month for each ID. Any idea how to achieve this in Python? The dataset contains multiple such IDs, this is just example of 1 such ID, the dataset has multiple IDs with similar timestamp data.
Solution
Use custom lambda function with select first mode by Series.mode
and Series.iat
in named aggregation by GroupBy.agg
:
df = pd.DataFrame({"ID":[1,1,1,2,2,2],
"Date": ["2020-01-02", "2020-01-09", "2020-01-08"]*2})
#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
"""Get week number in a month.
Parameters:
dates (pd.Series): Series of dates.
Returns:
pd.Series: Week number in a month.
"""
firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1, unit='d')
return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1
df.Date = pd.to_datetime(df.Date)
df['dayofweek'] = df.Date.dt.dayofweek
df['week'] = weekinmonth(df['Date'])
f = lambda x: x.mode().iat[0]
df1 = (df.groupby('ID', as_index=False).agg(Pref_Day_Of_Week_A=('dayofweek',f),
Pref_Week_Of_Month_A=('week',f)))
print (df1)
ID Pref_Day_Of_Week_A Pref_Week_Of_Month_A
0 1 3 2
1 2 3 2
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.