Issue
I am looking to plot several values of a feature against time and I am using the *args
argument to do so. What I am having is the several plots when enter two arguments or more in my function and I understand why this happens. However, I cannot figure out how to put all of them in the same plot. Here is what is my code to enlighten my intention.
Sample data:
year city population
2013 Ankara xxxx
2013 London xxxx
2013 Paris xxx
.... ..... xxx
2014 Ankara xxxx
2014 London xxx
2014 Paris xxxx
... .... ....
2015 Ankara xxxx
.... .... ....
When I do df[df['city']=='Ankara'
I get a df with the population of Ankara and unique years. Now what I am trying to do is to get 2 or Three cities from this and plot them on the same plot.
def city_over_time(*args):
global df
for city in args:
df=df[df['city']==city]
plt.plot(df.year, df.population)
plt.tight_layout()
So when I do the following:
city_over_time('Manchester', 'Liverpool')
I get one plot for Manchester and another one for Liverpool below. But I want both in the same figure. just like if I was plotting the following:
plt.plot(df[df.city=='Manchester']['year'], df[df.city=='Manchester']['population'])
plt.plot(df[df.city=='Liverpool']['year'], df[df.city=='Liverpool']['population'])
Solution
Use pivot
to get data in a more usable format, select cities as columns, then plot
:
def city_over_time(frame, *cities):
plot_df = frame.pivot(index='year',
columns='cities',
values='population')[list(cities)]
plot_df.plot(xticks=plot_df.index,
ylabel='Population')
plt.tight_layout()
plt.show()
Complete Working Example With Sample Data:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
np.random.seed(5)
df = pd.DataFrame({
'year': np.repeat(np.arange(2013, 2016), 3),
'cities': ['Ankara', 'London', 'Paris'] * 3,
'population': np.random.randint(100_000, 200_000, size=9)
})
print(df)
def city_over_time(frame, *cities):
plot_df = frame.pivot(index='year',
columns='cities',
values='population')[list(cities)]
plot_df.plot(xticks=plot_df.index,
ylabel='Population')
plt.tight_layout()
plt.show()
city_over_time(df, 'London', 'Paris')
df
:
year cities population
0 2013 Ankara 135683
1 2013 London 118638
2 2013 Paris 120463
3 2014 Ankara 105520
4 2014 London 159465
5 2014 Paris 133800
6 2015 Ankara 133508
7 2015 London 181639
8 2015 Paris 134750
plot_df
(df
after pivot
and filter on cities
):
cities London Paris
year
2013 118638 120463
2014 159465 133800
2015 181639 134750
Answered By - Henry Ecker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.