Saturday, December 30, 2023

[FIXED] How to calculate year-on-year data for weekly data?

December 30, 2023 dataframe, numpy, pandas, python-3.x No comments

Issue

Taking the Weekly Economic Index (Lewis-Mertens-Stock) data in the link as an example (click the download button on this page to download), I hope to calculate the year-on-year weekly data, but there aren't exactly 52 weeks in a year.

The main problem here is that some years mostly have 52 weeks, while for example, 2015 and 2020 have 53 weeks. I hope to calculate the year-on-year data for the 53rd week of 2015 and 2020 with the 52nd week of 2014 and 2019, so as to deal with the problem of inconsistency between weeks in these two years.

How can I write a weekly year-on-year calculation function to calculate the year-on-year comparison between the year with data for the 53rd week and the 52nd week of the previous year? Of course, if there are other ways to calculate the weekly year-over-year calculation, suggestions are also welcome. Thanks.

My code:

import pandas as pd

df = pd.read_excel('../raw_data/WEI.xls', skiprows=10)
df['observation_date'] = pd.to_datetime(df['observation_date'])
df

def add_date_parts(df, date_column):
     # df['year_week_ori'] = df[date_column].dt.strftime('%Y-%U')
     df['year_week_ori'] = df[date_column].dt.strftime('%Y-%V')
     df['year'] = df[date_column].dt.isocalendar().year
     df['week'] = df[date_column].dt.isocalendar().week
     # df['year_week_ori'] = df['year'].astype(str) + '-' + df['week'].astype(str).str.zfill(2)
     df.loc[df['week'] > 52, 'week'] = 52
     df['year_week_modified'] = df['year'].astype(str) + '-' + df['week'].astype(str).str.zfill(2)
     df['year'] = df[date_column].dt.year
     df['month'] = df[date_column].dt.month
     return df
df = add_date_parts(df, 'observation_date')
df

Out:

    observation_date   WEI year_week_ori  year  week year_week_modified  month
0         2008-01-05  1.80       2008-01  2008     1            2008-01      1
1         2008-01-12  1.78       2008-02  2008     2            2008-02      1
2         2008-01-19  1.75       2008-03  2008     3            2008-03      1
3         2008-01-26  1.28       2008-04  2008     4            2008-04      1
4         2008-02-02  0.99       2008-05  2008     5            2008-05      2
..               ...   ...           ...   ...   ...                ...    ...
817       2023-09-02  1.72       2023-35  2023    35            2023-35      9
818       2023-09-09  1.72       2023-36  2023    36            2023-36      9
819       2023-09-16  2.17       2023-37  2023    37            2023-37      9
820       2023-09-23  1.62       2023-38  2023    38            2023-38      9
821       2023-09-30  1.83       2023-39  2023    39            2023-39      9
[822 rows x 7 columns]

By checking the week quantities of each year, we can see that in the years 2011, 2016, and 2022, we have data for 53 weeks:

df.groupby('year')['WEI'].count()
Out[61]: 
year
2008    52
2009    52
2010    52
2011    53
2012    52
2013    52
2014    52
2015    52
2016    53
2017    52
2018    52
2019    52
2020    52
2021    52
2022    53
2023    39

My function (it is not yet compatible with the problem that weekly quantities in consecutive years cannot be aligned):

def calculate_week_on_week(df, value_column, date_column):
    df = df.sort_values(date_column)
    df['value_last_year'] = df[value_column].shift(52)
    df['week_on_week'] = (df[value_column] - df['value_last_year'])/df['value_last_year']
    return df

Solution

Example

To solve problem, we need a minimal and reproducible example. In your example, I deleted unnecessary columns and created a world that exists until the 2nd week.

import pandas as pd
data = {'date': ['2008-01-05', '2008-01-12', '2009-01-03', '2009-01-10', '2009-01-17'], 
        'WEI': [10, 20, 18, 14, 16], 
        'year': [2008, 2008, 2009, 2009, 2009], 'week': [1, 2, 1, 2, 3]}
df = pd.DataFrame(data)

    date        WEI year    week
0   2008-01-05  10  2008    1
1   2008-01-12  20  2008    2
2   2009-01-03  18  2009    1
3   2009-01-10  14  2009    2
4   2009-01-17  16  2009    3

Code

Most years end with 2 weeks (like 2008), but 3 years end with three weeks (like 2009). Return the year-over-year change rate, but compare 3rd week of 2009 with the 2nd week of 2008.

n = 3
s1 = df.groupby('week')['WEI'].pct_change()
cond = df['week'] == n
s1.mask(cond, df['WEI'].pct_change(n))

output:

0    NaN
1    NaN
2    0.8
3   -0.3
4   -0.2

For the first and second weeks, the data was grouped by week and compared to the corresponding weeks of the previous year. For the third week, the change rate was calculated by comparing to the value three rows above.

now make output to change column

df.assign(change=s1.mask(cond, df['WEI'].pct_change(n)))

output:

    date        WEI year    week    change
0   2008-01-05  10  2008    1       NaN
1   2008-01-12  20  2008    2       NaN
2   2009-01-03  18  2009    1       0.8    <- compare with 10
3   2009-01-10  14  2009    2       -0.3   <- compare with 20
4   2009-01-17  16  2009    3       -0.2   <- compare with 20

When applying to your dataset, replace n with 53 to match your example.

Answered By - Panda Kim

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, December 30, 2023

[FIXED] How to calculate year-on-year data for weekly data?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels