Issue
I am trying to draw a stock market graph
timeseries vs closing price and timeseries vs volume.
Somehow the x-axis shows the time in 1970
the following is the graph and the code
The code is:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
pd_data = pd.DataFrame(data, columns=['id', 'symbol', 'volume', 'high', 'low', 'open', 'datetime','close','datetime_utc','created_at'])
pd_data['DOB'] = pd.to_datetime(pd_data['datetime_utc']).dt.strftime('%Y-%m-%d')
pd_data.set_index('DOB')
print(pd_data)
print(pd_data.dtypes)
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
#ax.pd_data['volume'].plot(secondary_y=True, kind='bar')
ax1=pd_data.plot(y='volume',secondary_y=True, ax=ax,kind='bar')
ax1.set_ylabel('Volumne')
# Choose your xtick format string
date_fmt = '%d-%m-%y'
date_formatter = mdates.DateFormatter(date_fmt)
ax1.xaxis.set_major_formatter(date_formatter)
# set monthly locator
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
# set font and rotation for date tick labels
plt.gcf().autofmt_xdate()
plt.show()
Also tried the two graphs independently without ax=ax
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')
then price graph shows years properly whereas volumen graph shows 1970
And if i swap them
ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
Now the volume graph shows years properly whereas the price graph shows the years as 1970
I tried removing secondary_y and also changing bar to line. BUt no luck
Somehow pandas Data after first graph is changing the year.
Solution
- I do not advise plotting a bar plot with such a numerous amount of bars.
- This answer explains why there is an issue with the xtick labels, and how to resolve the issue.
- Plotting with
pandas.DataFrame.plot
works without issue with.set_major_locator
- Tested in
python 3.8.11
,pandas 1.3.2
,matplotlib 3.4.2
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import yfinance as yf # conda install -c conda-forge yfinance or pip install yfinance --upgrade --no-cache-dir
# download data
df = yf.download('amzn', start='2015-02-21', end='2021-04-27')
# plot
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)')
ax1 = df.plot(y='Volume', secondary_y=True, ax=ax, alpha=0.5, rot=0, lw=0.5)
ax1.set(ylabel='Volume')
# format
date_fmt = '%d-%m-%y'
years = mdates.YearLocator() # every year
yearsFmt = mdates.DateFormatter(date_fmt)
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
plt.setp(ax.get_xticklabels(), ha="center")
plt.show()
- Why are the OP x-tick labels starting from 1970?
- Bar plots locations are being 0 indexed (with pandas), and 0 corresponds to 1970
- See Pandas bar plot changes date format
- Most solutions with bar plots simply reformat the label to the appropriate datetime, however this is cosmetic and will not align the locations between the line plot and bar plot
- Solution 2 of this answer shows how to change the tick locators, but is really not worth the extra code, when
plt.bar
can be used.
print(pd.to_datetime(ax1.get_xticks()))
DatetimeIndex([ '1970-01-01 00:00:00',
'1970-01-01 00:00:00.000000001',
'1970-01-01 00:00:00.000000002',
'1970-01-01 00:00:00.000000003',
...
'1970-01-01 00:00:00.000001552',
'1970-01-01 00:00:00.000001553',
'1970-01-01 00:00:00.000001554',
'1970-01-01 00:00:00.000001555'],
dtype='datetime64[ns]', length=1556, freq=None)
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)')
print(ax.get_xticks())
ax1 = df.plot(y='Volume', secondary_y=True, ax=ax, kind='bar')
print(ax1.get_xticks())
ax1.set_xlim(0, 18628.)
date_fmt = '%d-%m-%y'
years = mdates.YearLocator() # every year
yearsFmt = mdates.DateFormatter(date_fmt)
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
[out]:
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.] ← ax tick locations
[ 0 1 2 ... 1553 1554 1555] ← ax1 tick locations
- With
plt.bar
the bar plot locations are indexed based on the datetime
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)', rot=0)
plt.setp(ax.get_xticklabels(), ha="center")
print(ax.get_xticks())
ax1 = ax.twinx()
ax1.bar(df.index, df.Volume)
print(ax1.get_xticks())
date_fmt = '%d-%m-%y'
years = mdates.YearLocator() # every year
yearsFmt = mdates.DateFormatter(date_fmt)
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
[out]:
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]
sns.barplot(x=df.index, y=df.Volume, ax=ax1)
hasxtick
locations as[ 0 1 2 ... 1553 1554 1555]
, so the bar plot and line plot did not align.
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.