Monday, May 9, 2022

[FIXED] How to create a comparison plot from an Excel file with many worksheets

May 09, 2022 matplotlib, pandas, python No comments

Issue

I want to create a plot that will have in the x axis the months and for each country it will show the Orders per month.

Attempting to create a function that reads an excel file and is able to support any number of countries using *args as a parameter. Then, it should loop though the countries that may be given by the end user and create a comparison plot.

Example of the excel that will be imported:

spain = {'Country': ['Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain', 'Spain'],
         'Month': ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'],
         'Temp': [10, 11, 13, 16, 21, 26, 29, 29, 25, 20, 16, 12],
         'Records': [7, 8, 8, 7, 7, 8, 10, 11, 8, 7, 7, 7],
         'Orders': [70, 70, 66, 60, 58, 50, 43, 43, 54, 63, 69, 70]}

Notes about the parameters of the function below:

file is the path of the xls file
month is the Month column of the xls that contains the months
column is the Orders column of the excel
*countries will be any number of countries (located in each sheet of the xls that is shown in the screenshot) that might be searched by the user and will be compared in the plot.

Note: The x_axis of the plot will be the months and each line in the plot will show the number of orders of each country for each month.

Here is what I have created, but it does not work properly.

import matplotlib.pyplot as plt
import pandas as pd

def PlotDataPerMonth(file,month,column,*countries):
    for i in range(len(countries)): # count based on the given countries that have been given
        data = pd.read_excel(file) # Using pd.read_excel() is required
        print(data)


    for name, data in data.groupby('countries'):
        plt.plot(data[month], data[column], label=countries) # month should be the 'month' column 
        plt.xlabel('Months')
        plt.ylabel('Number of Orders')
        plt.legend()
        plt.show()

The function with actual data:

PlotDataPerMonth('Book1.xlsx','Month','Orders',['Spain','Italy'])

Solution

There's no point to pass 'Month' to the function since it will always be the x-axis.
pd.read_excel(file, sheet_name=None) creates a dictionary of dataframes; one key-value pair for each worksheet.
- It's expected that all worksheets are similarly formatted.
A bar plot option has been included because it is better for comparing discrete values. This data is discrete, not continuous.
See inline comments for code explanations
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1

import calendar

def PlotDataPerMonth(file: str, y_col: str, countries: list, bar: bool=True):
    
    # read in all of the coutry tabs into a dict of dataframes, and them concat them into a single dataframe
    df = pd.concat(pd.read_excel(file, sheet_name=None)).reset_index(drop=True)
    
    # create a list ordered month names from the calendar module
    months = calendar.month_name[1:]
    
    # set the months as ordered and categorical so they will be plotted in order
    df.Month = pd.Categorical(df.Month, months, ordered=True)
    
    # select the countries
    df = df[df.Country.isin(countries)]
    
    # reshape the dataframe for plotting
    dp = df.pivot(index='Month', columns='Country', values=y_col)
    
    # bar plot or line plot
    if bar:
        kind='bar'
    else:
        kind='line'
        
    axe = dp.plot(kind=kind, figsize=(12, 8))
    
    if not bar:
        # set the xticks to show all the labels
        axe.set_xticks(ticks=range(len(dp.index)), labels=months)


# function call
PlotDataPerMonth('Book1.xlsx', 'Orders', ['Spain', 'Italy'], False)

The lines are on top of each other because the data is the same

PlotDataPerMonth('Book1.xlsx', 'Orders', ['Spain', 'Italy'])

Answered By - Trenton McKinney

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, May 9, 2022

[FIXED] How to create a comparison plot from an Excel file with many worksheets

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels