Issue
I have weekly data for several years where I have the start date and end date in datetime format. I now want to make a new column for each year I have data where the mean value of each month is calculated and stored for each hour for the years. All years should have the same format, so ignoring the leap year. So to summarize I have the following data:
input_data:
datetime | A | B | C | D | ... | Z |
---------------------|---|---|---|---| --- |---|
2015-01-01 00:00:00 |123| 23| 67|189| ... | 78|
................... |...|...|...|...| ... |...|
2021-06-01 00:00:00 |345| 87|456| 89| ... | 23|
where I have 2015-01-01 00:00:00 as start date and 2021-06-01 08:00:00 as end date. I would like to get something like: output:
datetime | 2015 | 2016 | 2017| 2018 | ... | 2021 |
----------------|---------|---------|---------|-----------|-----|----------|
01-01 00:00:00 |mean(A:Z)| mean(A:Z)| mean(A:Z)|mean(A:Z)| ... | mean(A:Z)|
................|.........|..........|..........|.........| ... |..........|
12-31 23:00:00 |mean(A:Z)| mean(A:Z)|mean(A:Z)| mean(A:Z)| ... | mean(A:Z)|
where mean(A:Z) is the mean value for each month of the columns A to Z. I would like to avoid to iterate over each hour for each year. How can best achieve this? Sorry if the question is to simple but I am currently stuck....
Solution
IIUC, you can use:
# Update
out = (df.assign(datetime=df['datetime'].dt.strftime('%m-%d %H:%M:%S'),
year=df['datetime'].dt.year.values)
.set_index(['datetime', 'year']).mean(axis=1)
.unstack('year'))
print(out)
# Alternative
# out = (df.set_index('datetime').mean(axis=1).to_frame('mean')
# .assign(datetime=df['datetime'].dt.strftime('%m-%d %H:%M:%S').values,
# year=df['datetime'].dt.year.values)
# .pivot('datetime', 'year', 'mean'))
# Output
year 2015 2016 2017
datetime
01-01 00:00:00 259.000000 420.000000 263.333333
01-01 01:00:00 263.000000 205.333333 169.000000
01-01 02:00:00 342.000000 268.000000 302.000000
01-01 03:00:00 63.000000 243.000000 220.000000
01-01 04:00:00 299.333333 282.666667 421.666667
... ... ... ...
12-31 19:00:00 82.666667 215.000000 84.333333
12-31 20:00:00 316.000000 367.000000 237.666667
12-31 21:00:00 319.666667 170.666667 275.666667
12-31 22:00:00 119.666667 263.666667 325.333333
12-31 23:00:00 252.666667 300.000000 94.666667
[8784 rows x 3 columns]
Setup:
import pandas
import numpy as np
np.random.seed(2022)
dti = pd.date_range('2015-01-01', '2017-12-31 23:00:00', freq='H', name='datetime')
df = pd.DataFrame(np.random.randint(1, 500, (len(dti), 3)),
index=dti, columns=list('ABC')).reset_index()
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.