Issue
I have the following dataframe:
Country variable value
0 Afghanistan Area 38.232510
1 Afghanistan Yield 70.081666
2 Argentina Area 96.776730
3 Argentina Area 60.047651
4 Argentina Yield 66.811117
.. ... ... ...
133 United States Of America Yield 53.536069
134 United States Of America Area 76.975885
135 United States Of America Yield 19.987656
136 Zambia Yield 39.493612
137 Zambia Yield 35.384809
I want to use it to construct a lollipop graphic (e.g. https://python-graph-gallery.com/184-lollipop-plot-with-2-groups/). However, the example dataframe is different from mine in that it has two values for each group while I want to plot the minimum and maximum value for the two groups for each country, with the group being differentiated by hue. How can I do it by modifying the code from that example?
Solution
This should make a good starting point to refine from:
df_agg = df.groupby(['Country', 'variable']).agg([min, max]).droplevel(level=0, axis=1).reset_index()
colours = {
'Area' : { 'line' : 'pink', 'min' : 'crimson', 'max' : 'red' },
'Yield' : { 'line' : 'skyblue', 'min' : 'navy', 'max' : 'blue' },
}
vars = df_agg['variable'].unique()
for var in vars:
df_plt = df_agg[df_agg['variable'] == var]
my_range = list(df_plt.index)
plt.hlines(y=my_range, xmin=df_plt['min'], xmax=df_plt['max'], color=colours[var]['line'], alpha=0.4)
plt.scatter(df_plt['min'], my_range, color=colours[var]['min'], alpha=1, label=f'{var} min')
plt.scatter(df_plt['max'], my_range, color=colours[var]['max'], alpha=1, label=f'{var} max')
# Add legend, title and axis names
plt.legend()
plt.yticks(df_agg.index, df_agg['Country'])
plt.title("Min and Max per Country", loc='left')
plt.xlabel('Values')
plt.ylabel('Country')
# Show the graph
plt.show()
For the data in your question, this gives:
You can also "group" the country values by making the my_range
value float around an integer value based on Country
, then only putting y ticks at those integer values:
df_agg = df.groupby(['Country', 'variable']).agg([min, max]).droplevel(level=0, axis=1).reset_index()
colours = {
'Area' : { 'line' : 'pink', 'min' : 'crimson', 'max' : 'red' },
'Yield' : { 'line' : 'skyblue', 'min' : 'navy', 'max' : 'blue' },
}
countries = list(df_agg['Country'].unique())
vars = df_agg['variable'].unique()
# figure out y positions for each lollipop
# make them go from y-0.2 to y+0.2
plot_y = { var : pt for var, pt in zip(vars, np.linspace(-0.2, 0.2, num=len(vars))) }
for var in vars:
df_plt = df_agg[df_agg['variable'] == var]
my_range = list(df_plt['Country'].apply(countries.index) + plot_y[var])
plt.hlines(y=my_range, xmin=df_plt['min'], xmax=df_plt['max'], color=colours[var]['line'], alpha=0.4)
plt.scatter(df_plt['min'], my_range, color=colours[var]['min'], alpha=1, label=f'{var} min')
plt.scatter(df_plt['max'], my_range, color=colours[var]['max'], alpha=1, label=f'{var} max')
# Add legend, title and axis names
plt.legend()
plt.yticks(range(len(countries)), countries)
plt.title("Min and Max per Country", loc='left')
plt.xlabel('Values')
plt.ylabel('Country')
# Show the graph
plt.show()
Answered By - Nick
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.