Issue
Taking the Weekly Economic Index (Lewis-Mertens-Stock) data in the link as an example (click the download button on this page to download), I hope to calculate the year-on-year weekly data, but there aren't exactly 52 weeks in a year.
The main problem here is that some years mostly have 52 weeks, while for example, 2015 and 2020 have 53 weeks. I hope to calculate the year-on-year data for the 53rd week of 2015 and 2020 with the 52nd week of 2014 and 2019, so as to deal with the problem of inconsistency between weeks in these two years.
How can I write a weekly year-on-year calculation function to calculate the year-on-year comparison between the year with data for the 53rd week and the 52nd week of the previous year? Of course, if there are other ways to calculate the weekly year-over-year calculation, suggestions are also welcome. Thanks.
My code:
import pandas as pd
df = pd.read_excel('../raw_data/WEI.xls', skiprows=10)
df['observation_date'] = pd.to_datetime(df['observation_date'])
df
def add_date_parts(df, date_column):
# df['year_week_ori'] = df[date_column].dt.strftime('%Y-%U')
df['year_week_ori'] = df[date_column].dt.strftime('%Y-%V')
df['year'] = df[date_column].dt.isocalendar().year
df['week'] = df[date_column].dt.isocalendar().week
# df['year_week_ori'] = df['year'].astype(str) + '-' + df['week'].astype(str).str.zfill(2)
df.loc[df['week'] > 52, 'week'] = 52
df['year_week_modified'] = df['year'].astype(str) + '-' + df['week'].astype(str).str.zfill(2)
df['year'] = df[date_column].dt.year
df['month'] = df[date_column].dt.month
return df
df = add_date_parts(df, 'observation_date')
df
Out:
observation_date WEI year_week_ori year week year_week_modified month
0 2008-01-05 1.80 2008-01 2008 1 2008-01 1
1 2008-01-12 1.78 2008-02 2008 2 2008-02 1
2 2008-01-19 1.75 2008-03 2008 3 2008-03 1
3 2008-01-26 1.28 2008-04 2008 4 2008-04 1
4 2008-02-02 0.99 2008-05 2008 5 2008-05 2
.. ... ... ... ... ... ... ...
817 2023-09-02 1.72 2023-35 2023 35 2023-35 9
818 2023-09-09 1.72 2023-36 2023 36 2023-36 9
819 2023-09-16 2.17 2023-37 2023 37 2023-37 9
820 2023-09-23 1.62 2023-38 2023 38 2023-38 9
821 2023-09-30 1.83 2023-39 2023 39 2023-39 9
[822 rows x 7 columns]
By checking the week quantities of each year, we can see that in the years 2011, 2016, and 2022, we have data for 53 weeks:
df.groupby('year')['WEI'].count()
Out[61]:
year
2008 52
2009 52
2010 52
2011 53
2012 52
2013 52
2014 52
2015 52
2016 53
2017 52
2018 52
2019 52
2020 52
2021 52
2022 53
2023 39
My function (it is not yet compatible with the problem that weekly quantities in consecutive years cannot be aligned):
def calculate_week_on_week(df, value_column, date_column):
df = df.sort_values(date_column)
df['value_last_year'] = df[value_column].shift(52)
df['week_on_week'] = (df[value_column] - df['value_last_year'])/df['value_last_year']
return df
Solution
Example
To solve problem, we need a minimal and reproducible example. In your example, I deleted unnecessary columns and created a world that exists until the 2nd week.
import pandas as pd
data = {'date': ['2008-01-05', '2008-01-12', '2009-01-03', '2009-01-10', '2009-01-17'],
'WEI': [10, 20, 18, 14, 16],
'year': [2008, 2008, 2009, 2009, 2009], 'week': [1, 2, 1, 2, 3]}
df = pd.DataFrame(data)
df
date WEI year week
0 2008-01-05 10 2008 1
1 2008-01-12 20 2008 2
2 2009-01-03 18 2009 1
3 2009-01-10 14 2009 2
4 2009-01-17 16 2009 3
Code
Most years end with 2 weeks (like 2008), but 3 years end with three weeks (like 2009). Return the year-over-year change rate, but compare 3rd week of 2009 with the 2nd week of 2008.
n = 3
s1 = df.groupby('week')['WEI'].pct_change()
cond = df['week'] == n
s1.mask(cond, df['WEI'].pct_change(n))
output:
0 NaN
1 NaN
2 0.8
3 -0.3
4 -0.2
For the first and second weeks, the data was grouped by week and compared to the corresponding weeks of the previous year. For the third week, the change rate was calculated by comparing to the value three rows above.
now make output to change
column
df.assign(change=s1.mask(cond, df['WEI'].pct_change(n)))
output:
date WEI year week change
0 2008-01-05 10 2008 1 NaN
1 2008-01-12 20 2008 2 NaN
2 2009-01-03 18 2009 1 0.8 <- compare with 10
3 2009-01-10 14 2009 2 -0.3 <- compare with 20
4 2009-01-17 16 2009 3 -0.2 <- compare with 20
When applying to your dataset, replace n with 53 to match your example.
Answered By - Panda Kim
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.