Issue
Trying to create a visualization within python and while it works on jupyter notebook I'm unable to get the output I want within an actual IDE. Has this happen to anyone else? Code is exact same in both. And the jupyter runs in 1 cell (literally just copy and pasted).
PyCharm:
Jupyter:
import pandas as pd
from matplotlib import pyplot as plt #also tried import matplotlib.pyplot as plt
from datetime import date, timedelta
fig = plt.figure(figsize=(12, 7))
df = pd.read_csv('https://raw.githubusercontent.com/datasets/covid-19/main/data/countries-aggregated.csv',
parse_dates=['Date'])
yesterday = date.today() - timedelta(days=1)
yesterday.strftime('%Y-%m-%d')
today_df = df[df['Date'] == yesterday]
top_10 = today_df.sort_values(['Confirmed'], ascending=False)[:10]
top_10.loc['rest-of-world'] = today_df.sort_values(['Confirmed'], ascending=False)[10:].sum()
top_10.loc['rest-of-world', 'Country'] = 'Rest of World'
ax = fig.add_subplot(111)
ax.pie(top_10['Confirmed'], labels=top_10['Country'], autopct='%1.1f%%')
ax.title.set_text('Hardest Hit Countries Worldwide')
plt.legend(loc='upper left')
plt.show()
Solution
I tested code without IDE
and it gives me the same wrong plot.
It also displays that it needs normalize=True
in ax.pie()
so I added it.
But it still didn't resolve main problem.
If you would use print()
to see values in variables then probably you could see empty today_df
and it makes wrong plot.
First you forgot to assing yesterday.strftime('%Y-%m-%d')
yesterday = yesterday.strftime('%Y-%m-%d')
But most weird problem was my time zone. I live in place where yesterday
gives date which doesn't exist yet in CSV
and I had to use timedelta(days=2)
(day before yesterday) to get some data and see plot.
Maybe after few hours they will update data in CSV
and timedelta(days=1)
will work for few hour - and later it will gives again have the same problem, etc.
Better use
yesterday = max(df['Date'])
to get the newest data.
import pandas as pd
from matplotlib import pyplot as plt
from datetime import date, timedelta
url = 'https://raw.githubusercontent.com/datasets/covid-19/main/data/countries-aggregated.csv'
fig = plt.figure(figsize=(12, 7))
df = pd.read_csv(url, parse_dates=['Date'])
#yesterday = date.today() - timedelta(days=2)
#yesterday = yesterday.strftime('%Y-%m-%d')
yesterday = max(df['Date'])
print('yesterday:', yesterday)
today_df = df[df['Date'] == yesterday]
print(today_df)
top_10 = today_df.sort_values(['Confirmed'], ascending=False)[:10]
top_10.loc['rest-of-world'] = today_df.sort_values(['Confirmed'], ascending=False)[10:].sum()
top_10.loc['rest-of-world', 'Country'] = 'Rest of World'
ax = fig.add_subplot(111)
ax.pie(top_10['Confirmed'], labels=top_10['Country'], autopct='%1.1f%%', normalize=True)
ax.title.set_text('Hardest Hit Countries Worldwide')
plt.legend(loc='upper left')
plt.show()
Answered By - furas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.