Issue
I need your help to plot a graph for each item in a dataframe ("Event" column). I created a loop, although the result brings the number of graphs and the title correctly as desired, the graph lines consider the values of the entire dataframe, not considering the filter in the "for - in"
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame()
df['Date']=['2023-07-01', '2023-07-02','2023-07-03','2023-07-04','2023-07-05','2023-07-01', '2023-07-02','2023-07-03','2023-07-04','2023-07-05']
df['Event']=['abc','def','abc','def','abc','def','abc','def','abc','def']
df['Value']=[43,12,19,45,34,21,21,45,38,14]
eve = df['Event'].unique()
for q in eve:
plt.figure(figsize=(10,6))
plt.plot(df['Date'],df['Value'], marker='o', linestyle='-', label='Value')
plt.xticks(rotation=90)
plt.title(q, fontsize=18)
plt.show()
plt.close()
The two first images are the obteined output, and the last two plot are the desired output.
Solution
If you remove the .sort_values(by=['Date'])
this produces the plots that you want, but to me, it seems very odd to have random dates on the x-axis. Also despite this method being "clean", it is extremely inefficient and won't work well with larger datasets. Let me know if the performance is an issue so I can send you a more optmized version.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame()
df['Date']=['2023-07-01', '2023-07-02','2023-07-03','2023-07-04','2023-07-05','2023-07-01', '2023-07-02','2023-07-03','2023-07-04','2023-07-05']
df['Event']=['abc','def','abc','def','abc','def','abc','def','abc','def']
df['Value']=[43,12,19,45,34,21,21,45,38,14]
unique_events = df['Event'].unique()
for event in unique_events:
temp = df[df['Event']==event].sort_values(by=['Date'])
plt.figure(figsize=(10,6))
plt.plot(temp['Date'],temp['Value'], marker='o', linestyle='-', label='Value')
plt.xticks(rotation=90)
plt.title(event, fontsize=18)
plt.show()
edit: lookup table based approach:
from collections import defaultdict
events_dict = defaultdict(list)
di = df.to_dict()
for row, event in di['Event'].items():
events_dict[event].append((di['Date'][row], di['Value'][row]))
for event in events_dict:
plt.figure(figsize=(10,6))
x,y = zip(*sorted(events_dict[event]))
plt.plot(x, y, marker='o', linestyle='-', label='Value')
plt.xticks(rotation=90)
plt.title(event, fontsize=18)
plt.show()
As mentioned before, on small datasets it shouldn't matter, but if you work with hundreds of thousands, or millions of rows, the difference start to add up quickly
Answered By - OM222O
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.