Issue
I want to create a plot that will have in the x axis the months and for each country it will show the Orders per month.
Attempting to create a function that reads an excel file and is able to support any number of countries using *args as a parameter. Then, it should loop though the countries that may be given by the end user and create a comparison plot.
Example of the excel that will be imported:
spain = {'Country': ['Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain'],
'Month': ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'],
'Temp': [10, 11, 13, 16, 21, 26, 29, 29, 25, 20, 16, 12],
'Records': [7, 8, 8, 7, 7, 8, 10, 11, 8, 7, 7, 7],
'Orders': [70, 70, 66, 60, 58, 50, 43, 43, 54, 63, 69, 70]}
Notes about the parameters of the function below:
- file is the path of the xls file
- month is the Month column of the xls that contains the months
- column is the Orders column of the excel
- *countries will be any number of countries (located in each sheet of the xls that is shown in the screenshot) that might be searched by the user and will be compared in the plot.
Note: The x_axis of the plot will be the months and each line in the plot will show the number of orders of each country for each month.
Here is what I have created, but it does not work properly.
import matplotlib.pyplot as plt
import pandas as pd
def PlotDataPerMonth(file,month,column,*countries):
for i in range(len(countries)): # count based on the given countries that have been given
data = pd.read_excel(file) # Using pd.read_excel() is required
print(data)
for name, data in data.groupby('countries'):
plt.plot(data[month], data[column], label=countries) # month should be the 'month' column
plt.xlabel('Months')
plt.ylabel('Number of Orders')
plt.legend()
plt.show()
The function with actual data:
PlotDataPerMonth('Book1.xlsx','Month','Orders',['Spain','Italy'])
Solution
- There's no point to pass
'Month'
to the function since it will always be the x-axis. pd.read_excel(file, sheet_name=None)
creates a dictionary of dataframes; one key-value pair for each worksheet.- It's expected that all worksheets are similarly formatted.
- A bar plot option has been included because it is better for comparing discrete values. This data is discrete, not continuous.
- See inline comments for code explanations
- Tested in
python 3.10
,pandas 1.4.2
,matplotlib 3.5.1
import calendar
def PlotDataPerMonth(file: str, y_col: str, countries: list, bar: bool=True):
# read in all of the coutry tabs into a dict of dataframes, and them concat them into a single dataframe
df = pd.concat(pd.read_excel(file, sheet_name=None)).reset_index(drop=True)
# create a list ordered month names from the calendar module
months = calendar.month_name[1:]
# set the months as ordered and categorical so they will be plotted in order
df.Month = pd.Categorical(df.Month, months, ordered=True)
# select the countries
df = df[df.Country.isin(countries)]
# reshape the dataframe for plotting
dp = df.pivot(index='Month', columns='Country', values=y_col)
# bar plot or line plot
if bar:
kind='bar'
else:
kind='line'
axe = dp.plot(kind=kind, figsize=(12, 8))
if not bar:
# set the xticks to show all the labels
axe.set_xticks(ticks=range(len(dp.index)), labels=months)
# function call
PlotDataPerMonth('Book1.xlsx', 'Orders', ['Spain', 'Italy'], False)
- The lines are on top of each other because the data is the same
PlotDataPerMonth('Book1.xlsx', 'Orders', ['Spain', 'Italy'])
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.