Monday, November 27, 2023

[FIXED] Loop for plots unique values in a dataframe

November 27, 2023 matplotlib, pandas, plot, python No comments

Issue

I need your help to plot a graph for each item in a dataframe ("Event" column). I created a loop, although the result brings the number of graphs and the title correctly as desired, the graph lines consider the values of the entire dataframe, not considering the filter in the "for - in"

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame()
df['Date']=['2023-07-01', '2023-07-02','2023-07-03','2023-07-04','2023-07-05','2023-07-01', '2023-07-02','2023-07-03','2023-07-04','2023-07-05']
df['Event']=['abc','def','abc','def','abc','def','abc','def','abc','def']
df['Value']=[43,12,19,45,34,21,21,45,38,14]

eve = df['Event'].unique()
for q in eve:
    plt.figure(figsize=(10,6))
    plt.plot(df['Date'],df['Value'], marker='o', linestyle='-', label='Value')
    plt.xticks(rotation=90)
    plt.title(q, fontsize=18)
    plt.show()
plt.close()

The two first images are the obteined output, and the last two plot are the desired output.

Solution

If you remove the .sort_values(by=['Date']) this produces the plots that you want, but to me, it seems very odd to have random dates on the x-axis. Also despite this method being "clean", it is extremely inefficient and won't work well with larger datasets. Let me know if the performance is an issue so I can send you a more optmized version.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame()
df['Date']=['2023-07-01', '2023-07-02','2023-07-03','2023-07-04','2023-07-05','2023-07-01', '2023-07-02','2023-07-03','2023-07-04','2023-07-05']
df['Event']=['abc','def','abc','def','abc','def','abc','def','abc','def']
df['Value']=[43,12,19,45,34,21,21,45,38,14]

unique_events = df['Event'].unique()
for event in unique_events:
    temp = df[df['Event']==event].sort_values(by=['Date'])
    plt.figure(figsize=(10,6))
    plt.plot(temp['Date'],temp['Value'], marker='o', linestyle='-', label='Value')
    plt.xticks(rotation=90)
    plt.title(event, fontsize=18)
    plt.show()

edit: lookup table based approach:

from collections import defaultdict

events_dict = defaultdict(list)
di = df.to_dict()

for row, event in di['Event'].items():
    events_dict[event].append((di['Date'][row], di['Value'][row]))

for event in events_dict:
    plt.figure(figsize=(10,6))
    x,y = zip(*sorted(events_dict[event]))
    plt.plot(x, y, marker='o', linestyle='-', label='Value')
    plt.xticks(rotation=90)
    plt.title(event, fontsize=18)
    plt.show()

performance difference:

As mentioned before, on small datasets it shouldn't matter, but if you work with hundreds of thousands, or millions of rows, the difference start to add up quickly

Answered By - OM222O

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, November 27, 2023

[FIXED] Loop for plots unique values in a dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels