Issue
I have a Dataframe like this -
CITY | BANK | DEPOSIT |
---|---|---|
NewYork | ABC | 10 |
Seattle | ABC | 30 |
NewYork | LMN | 99 |
NewYork | PQR | 100 |
Seattle | PQR | 50 |
Seattle | LMN | 43 |
There are multiple cities in the column. A bank can be present in many cities and a city can have many banks(many to many relation). In a city, every bank has certain deposit('DEPOSIT') column.
I want to draw a bar graph to plot the top 5 cities according to their total deposits(across all the banks in that city), and for each city, I want to have 3 banks with top deposit in that city, so basically 15 plots on x axis.
The X axis should have city names(total of 5), and each city should have 3 bars with banks having maximum deposits(figure on the top).
What I have tried:
I tried a couple of things, first tried to group by city and then banks, then reset the index to have a dataframe to plot. But then couldnt figure out how to go ahead with plotting part.
Here's the code(which doesnt work, but I tried).
grp_city = df.groupby(['CITY','BANK'])['DEPOSIT'].sum().reset_index()
grp_city.sort_values(by = grp_city.groupby('CITY')['DEPOSIT'].sum().reset_index()['DEPOSIT'])
I tried several other things but they doesn't seem to work either. Any help appreciated.
Solution
To get the top 3 bank deposits, we can group the data twice, first by city and bank and then by city only. Separately (to find the top 5 cities by total deposit), group by city and return the 5 largest deposit sum. Then I just filtered the data to make plotting simple.
import pandas as pd
import numpy as np
# Example data
np.random.seed(100)
df = pd.DataFrame({'CITY': np.random.choice(['NewYork', 'Seattle', 'Boston', 'LosAngeles', 'Chicago', 'Washington', 'Denver'], size=50, p=[1/7 for i in range(7)]),
'BANK': np.random.choice(['ABC', 'LMN', 'PQR', 'BDR', 'BBB', 'NUB', 'INT'], size=50, p=[1/7 for i in range(7)]),
'DEPOSIT': np.random.randint(1, 300, size=50)})
# Group data by city and by the top 3 bank deposits
grp_city = df.groupby(['CITY', 'BANK'])['DEPOSIT'].max()
grp_city = grp_city.groupby(level='CITY').nlargest(3).reset_index(level=0, drop=True)
df_max_deposits = grp_city.reset_index()
# Top 5 cities by total deposit
top_cities = df.groupby(['CITY'])['DEPOSIT'].sum().nlargest(5).index
# Filter the cities that we want in the plot
plot_df = df_max_deposits.query('CITY in @top_cities')
# Finally plot the data
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(11, 8))
sns.set_theme(context='talk')
sns.barplot(x='CITY', y='DEPOSIT', hue='BANK', data=plot_df)
plt.legend(loc=(0.1, 0.6))
Answered By - danpl
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.