Issue
I have a data set (CSV file) with monthly benzine prices in Norway from 1986 to 2022. I want to make a new data frame with an average benzine price per year. Furthermore, I managed to do it, but is there a more efficient way to do it? A link to the CSV file is given. Here is what I did,
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
bensin=pd.read_csv("C:/Users/.../BensinPris.csv",sep=',', delimiter=';')
bensinC=bensin.copy()
#Replace comma with dot in bensin price
bensinC.apply(lambda x: x.str.replace(',','.'))
bensinC= bensinC.stack().str.replace(',','.').unstack()
#Convert prices to floats from strings
bensinC['Priser (kr per liter)']=[float(x) for x in bensinC['Priser (kr per liter)']]
# Make a list of average bensin price each year
average=[]
yrs=np.arange(1986,2023,dtype='int')
for i in yrs:
av=bensinC[[x[0:4]==str(i) for x in bensinC['ÅrMåned']]]['Priser (kr per liter)'].mean()
average.append(np.round(av,2))
# Make a data frame for average bensin price
zipped = list(zip(yrs,average))
benisin_average=pd.DataFrame(zipped,columns=['Year','Average bensin price'])
benisin_average
# ploting
X=benisin_average['Year']
y=benisin_average['Average bensin price']
plt.plot(X[:],y[:])
plt.xlabel('Time(Year)')
plt.ylabel('Average bensin price (Norwegian krone/Liter)')
plt.show()
Solution
I renamed your columns to period
and price
respectively. It's a little easier to understand for those who don't speak Norwegian.
# Modify the period column to an actual datetime
df['period'] = pd.to_datetime(df.period.str.replace('M','',regex=False) + '01',
yearfirst=True)
# Convert the price to a float
df['price'] = df.price.str.replace(',','.').astype('float')
# Resample the dataframe to end of year, apply the mean as aggregate
df_agg = df.resample('Y', on='period').mean().round(2).reset_index()
Resulting df_agg
(sample):
period | price |
---|---|
1986-12-31 | 4.43 |
1987-12-31 | 4.65 |
1988-12-31 | 4.79 |
1989-12-31 | 5.13 |
1990-12-31 | 5.56 |
1991-12-31 | 6.4 |
1992-12-31 | 6.76 |
1993-12-31 | 7.12 |
1994-12-31 | 7.19 |
Note: resample
allows you to use frequency strings, which makes it possible to not only group by year, as we did in this example, but also e.g. by quater.
Answered By - DocZerø
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.