Thursday, June 9, 2022

[FIXED] How to calculate the mean value of a month but store it hourly in pandas?

June 09, 2022 datetime, mean, pandas, python No comments

Issue

I have weekly data for several years where I have the start date and end date in datetime format. I now want to make a new column for each year I have data where the mean value of each month is calculated and stored for each hour for the years. All years should have the same format, so ignoring the leap year. So to summarize I have the following data:

input_data:

datetime             | A | B | C | D | ... | Z |
---------------------|---|---|---|---| --- |---|
2015-01-01 00:00:00  |123| 23| 67|189| ... | 78|
...................  |...|...|...|...| ... |...|
2021-06-01 00:00:00  |345| 87|456| 89| ... | 23|

where I have 2015-01-01 00:00:00 as start date and 2021-06-01 08:00:00 as end date. I would like to get something like: output:

datetime        | 2015    |    2016 |     2017|      2018 | ... |     2021 |
----------------|---------|---------|---------|-----------|-----|----------|
01-01 00:00:00  |mean(A:Z)| mean(A:Z)| mean(A:Z)|mean(A:Z)| ... | mean(A:Z)|
................|.........|..........|..........|.........| ... |..........|
12-31 23:00:00  |mean(A:Z)| mean(A:Z)|mean(A:Z)| mean(A:Z)| ... | mean(A:Z)|

where mean(A:Z) is the mean value for each month of the columns A to Z. I would like to avoid to iterate over each hour for each year. How can best achieve this? Sorry if the question is to simple but I am currently stuck....

Solution

IIUC, you can use:

# Update
out = (df.assign(datetime=df['datetime'].dt.strftime('%m-%d %H:%M:%S'),
                 year=df['datetime'].dt.year.values)
         .set_index(['datetime', 'year']).mean(axis=1)
         .unstack('year'))
print(out)

# Alternative
# out = (df.set_index('datetime').mean(axis=1).to_frame('mean')
#          .assign(datetime=df['datetime'].dt.strftime('%m-%d %H:%M:%S').values, 
#                  year=df['datetime'].dt.year.values)
#          .pivot('datetime', 'year', 'mean'))

# Output
year                  2015        2016        2017
datetime                                          
01-01 00:00:00  259.000000  420.000000  263.333333
01-01 01:00:00  263.000000  205.333333  169.000000
01-01 02:00:00  342.000000  268.000000  302.000000
01-01 03:00:00   63.000000  243.000000  220.000000
01-01 04:00:00  299.333333  282.666667  421.666667
...                    ...         ...         ...
12-31 19:00:00   82.666667  215.000000   84.333333
12-31 20:00:00  316.000000  367.000000  237.666667
12-31 21:00:00  319.666667  170.666667  275.666667
12-31 22:00:00  119.666667  263.666667  325.333333
12-31 23:00:00  252.666667  300.000000   94.666667

[8784 rows x 3 columns]

Setup:

import pandas
import numpy as np

np.random.seed(2022)
dti = pd.date_range('2015-01-01', '2017-12-31 23:00:00', freq='H', name='datetime')
df = pd.DataFrame(np.random.randint(1, 500, (len(dti), 3)),
                  index=dti, columns=list('ABC')).reset_index()

Answered By - Corralien

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, June 9, 2022

[FIXED] How to calculate the mean value of a month but store it hourly in pandas?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels