Issue
I used this code to group the avg. life expectancy by year and continent:
avg_lifeExp_by_cont_yr = df.groupby(['year','continent'])['lifeExp'].mean()
The result looks like this:
I want to create a line chart that has the year on the x-axis, avg. life expectancy on the y-axis, and the continent to be used as the legend (so one line per continent).
Solution
You can use df.unstack('continent')
to place continent as columns, then this dataframe becomes a 2D table where the 1st column is the X, and other columns are Y. You can directly call plot
function or control the plot yourself by raw matplotlib operations.
Thanks for your data, here is the complete code sample for your request:
# imports
import pandas as pd
import matplotlib.pyplot as plt
# prepare dataframe
df = pd.read_csv('gapminder.tsv', sep='\t')
df = df.groupby(['year','continent']).lifeExp.mean()
# unstack the `continent` index, to place it as columns
df = df.unstack(level='continent')
# The name of columns would become the name of legend
# when using dataframe plot
df.columns.name = 'Life Expectation'
# Now, we have a 2d talbe, 1st column become to X
# and other columns become to Y
# In [14]: df.head()
# Out[14]:
# Life Expectation Africa Americas Asia Europe Oceania
# year
# 1952 39.135500 53.27984 46.314394 64.408500 69.255
# 1957 41.266346 55.96028 49.318544 66.703067 70.295
# 1962 43.319442 58.39876 51.563223 68.539233 71.085
# 1967 45.334538 60.41092 54.663640 69.737600 71.310
# 1972 47.450942 62.39492 57.319269 70.775033 71.910
# matplotlib operations
# Here we use dataframe plot function
# You could also use raw matplotlib plot one column each to do fine control
# Please polish the figure with more configurations
fig, ax = plt.subplots(figsize=(6, 4.5))
df.plot()
There are several tricks in the data processing, please check the comments in the code. The rough plot looks like
Please polish your figure with more matplotlib operations. For example:
- Set y-label
- Heigh of the two large, set legend to two columns to reduce it
- Colors of the line, or shapes of the line
- Line with markers?
Here are some tweaks
# set axis labels
ax.set_xlabel('Year')
ax.set_ylabel('Life Expection')
# set markers
markers = ['o', 's', 'd', '^', 'v']
for i, line in enumerate(ax.get_lines()):
line.set_marker(markers[i])
# update legend
ax.legend(ax.get_lines(), df.columns, loc='best', ncol=2)
plt.tight_layout()
Answered By - rojeeer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.