Issue
I am using this dataset from data.world to learn how to plot radial charts using matplotlib and I am not really sure how to calculate the radius and the angle of the plot.
If I plot the data as a scatter plot with time in the x-axis and year month in the Y-axis, I will get the following: (ignore the dates on the x axis they are indeed time).
Now, I want to covert that into a radial or polar plot, like this: ( i used another tool to do that), where the angular axis is time and the radial is year month.
My question is, how do I calculate r and theta to plot that using matplotlib? In the how I mean the logic to convert cartesian to polar, not the actual code. I am looking to understand how it works in general.
c = ax.scatter(theta, r)
I have seen a few examples online, but none explains the logic behind? Thanks!
Solution
Polar plotting in matplotlib can be challenging because of the coordinate conversion, as you mentioned, and more so when you add the date/time to the x/y axis like in your case. Here is a stab at it.
The first important step is to open the data and condition it. We can use the pandas
library to open the csv with your data (data.world) and extract the y-axis which will be the year/month/day data, and the x-axis which will be the hour/minute/second data.
from datetime import timezone, datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df_csv = pd.read_csv('data-tweets.csv', encoding="ISO-8859-1", parse_dates=['created'])
# For some reason, the data is not in utf-8 so it needs encoding ISO-8859-1
# By passing parse_date we make the values in the column 'created' as pandas datetimes
print(f'Available columns: {df_csv.columns.values}')
# Filter for 2016 as an example
df = df_csv[(df_csv['created'] >= '2016-01-01 00:00:00') & (df_csv['created'] <= '2016-12-31 00:00:00')]
# Reset index with the 2016 data
df.reset_index(inplace=True)
# Extract year/month/day to plot as y-axis
date = pd.to_datetime(df['created']).dt.strftime('%Y/%m/%d').to_numpy()
# Extract hour:minute:second to plot as x-axis
time = pd.to_datetime(df['created']).dt.strftime('%H:%M:%S').to_numpy()
We can plot the data we currently have in the cartesian x/y coordinates so we can have a sanity check later.
# Time and date are currently strings, we need them to be python datetime (dt) objects so matplotlib can understand
date_dt = [datetime.strptime(x, '%Y/%m/%d').replace(tzinfo=timezone.utc) for x in date]
time_dt = [datetime.strptime(x, '%H:%M:%S').replace(tzinfo=timezone.utc) for x in time]
# Initiate figure
fig, ax = plt.subplots(figsize=(6, 6), layout='tight')
ax.plot(time_dt, date_dt, 'o')
# Format date/time for both x and y axes
x_fmt = mdates.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(x_fmt)
y_fmt = mdates.DateFormatter('%Y/%m/%d')
ax.yaxis.set_major_formatter(y_fmt)
ax.tick_params(axis='x', rotation=45)
# Label axes
ax.set_xlabel('Time [HH:MM:SS]')
ax.set_ylabel('Date [Y/m/d]')
fig.suptitle('Tweet timeline')
plt.show()
Okay now on to the polar part. For polar coordinates you need r
, radial coordinate, and theta
, an angle. For more info, you can check
Wikipedia's Polar coordinate info, but the gist of it is that r = square root of (x squared + y squared)
, and theta = angle or fraction in radians
(i.e. to obtain radians you multiply by 2 * Pi). Roughly speaking, in this case you can think of theta
as the x-axis, aka the time in hours, and r
as the y-axis, aka the dates.
We have two issues to address before we calculate r
and theta
:
Before we can even begin to think about summing or squaring our x and y values, we need them to be actual numbers. x and y are currently
datetime
objects, which works for plotting with matplotlib (as before) but not for doing any math heavy operations with them. The key will be to convert the x/y datetime values into timestamps.The key to converting from cartesian x/y to polar is that everything has to have the same units when time is involved. We currently don't have that, x is in dates, and y is in time.
Let's start with issue #1. Converting the y-axis to timestamps is easy enough using date2num
from matplotlib.dates
(you can use other methods to obtain the timestamp but using date2num
and converting them into matplotlib timestamps will be incredibly useful later when we need to format the axes):
# We need date and time to be timestamps (i.e. a number, not a datetime object) so we can operate with them
# Chose directly to do matplotlib timestamps as we can later format the axes like we did before
date_timestamp = mdates.date2num(date_dt)
However, the x-axis is a bit more tricky because it is in time hours/minutes/seconds, not the full year/month/day datetime, and date2num
needs a full datetime. The trick here is we will use the original full date and time in the csv, convert it into a matplotlib timestamp and then subtract the "date" part we just calculated - so the remaining values are hours in matplotlib timestamps.
# Now we need to make the time (hours/minutes/second) into matplotlib timestamps too
# However matplotlib timestamps only works with full datetime timestamps(not just hours)
# Get the full datetime timestamps
dates = pd.to_datetime(df['created']).dt.strftime('%Y/%m/%d %H:%M:%S').to_numpy()
# Make a datetime object like previously
date_and_time_dt = np.array([datetime.strptime(x, '%Y/%m/%d %H:%M:%S').replace(tzinfo=timezone.utc) for x in dates])
# Make them into matplotlib timestamps
date_and_time_timestamp = mdates.date2num(date_and_time_dt)
# Take out the 'year/month/day' part so we can keep the 'hour/minute/second' part
# Now we have the hour information in matplotlib timestamps
time_timestamp = date_and_time_timestamp - date_timestamp
So in the process of calculating the x values into matplotlib timestamps we have solved issue #1: now both x and y values are timestamps so we can do math with them, and issue #2: both x and y values are in matplotlib timestamps so they have the same units.
We can now calculate r
:
# Convert the cartesian x and y coordinates into polar coordinates
r = np.sqrt(time_timestamp ** 2 + date_timestamp ** 2)
For theta
, the angle, you can think of an angle as a fraction or percentage of a circle. A similar thought process can be applied to calculating theta
here:
# We need the percentages of 24 hours for theta
# Calculate what one day is in matplotlib timestamps
delta_one_day_plt = mdates.date2num(datetime(2016, 1, 2)) - mdates.date2num(datetime(2016, 1, 1))
# Divide each hour/minute/second by the max amount in a day, and transform it into radians with 2 * PI
theta = (time_timestamp / delta_one_day_plt) * 2 * np.pi
Now that we have r
and theta
, we can finally plot the polar plot, following suggestions from the SO post How to Plot Time Stamps HH:MM on Python Matplotlib "Clock" Polar Plot:
# Initiate polar figure
fig, ax = plt.subplots(figsize=(6, 6), subplot_kw={'projection': 'polar'}, layout='tight')
ax.scatter(theta, r, alpha=0.3)
# Make the labels go clockwise
ax.set_theta_direction(-1)
# Place Zero at Top
ax.set_theta_offset(np.pi/2)
# Set the circumference ticks
ax.set_xticks(np.linspace(0, 2 * np.pi, 24, endpoint=False))
# Set the label names
ticks = np.arange(0, 24, 1)
ax.set_xticklabels(ticks)
ax.set_xlabel('Date [Y/m/d] & Time [HH]')
# Set y lim so that it focuses on the dates in 2016
ax.set_ylim([mdates.date2num(datetime(2016, 1, 1)), mdates.date2num(datetime(2017, 1, 1))])
# Set y ticks so that it is in that 'gap' in the data and doesn't cover the points
ax.set_rlabel_position(140)
# Format y-axis for dates
y_fmt = mdates.DateFormatter('%Y/%m/%d')
ax.yaxis.set_major_formatter(y_fmt)
fig.suptitle('Tweet timeline')
plt.show()
As a sanity check, both graphs have a gap between the hours of 6 and 10, and a gap around September- October.
Hope this helps, matplotlib has a nice Polar plot example. I have to admit this was a challenging problem and I enjoyed finding an answer, thanks for the question. Cheers!
Here is a copy of the full script just in case:
from datetime import timezone, datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df_csv = pd.read_csv('data-tweets.csv', encoding="ISO-8859-1", parse_dates=['created'])
# For some reason, the data is not in utf-8 so it needs encoding ISO-8859-1
print(f'Available columns: {df_csv.columns.values}')
# Filter for 2016 as an example
df = df_csv[(df_csv['created'] >= '2016-01-01 00:00:00') & (df_csv['created'] <= '2016-12-31 00:00:00')]
# Reset index with the 2016 data
df.reset_index(inplace=True)
# Extract year/month/day to plot as y-axis
date = pd.to_datetime(df['created']).dt.strftime('%Y/%m/%d').to_numpy()
# Extract hour:minute:second to plot as x-axis
time = pd.to_datetime(df['created']).dt.strftime('%H:%M:%S').to_numpy()
# Time and date are currently strings, we need them to be datetime (dt) objects so matplotlib can understand
date_dt = [datetime.strptime(x, '%Y/%m/%d').replace(tzinfo=timezone.utc) for x in date]
time_dt = [datetime.strptime(x, '%H:%M:%S').replace(tzinfo=timezone.utc) for x in time]
# Initiate figure
fig, ax = plt.subplots(figsize=(6, 6), layout='tight')
ax.plot(time_dt, date_dt, 'o')
# Format date/time for both x and y axes
x_fmt = mdates.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(x_fmt)
y_fmt = mdates.DateFormatter('%Y/%m/%d')
ax.yaxis.set_major_formatter(y_fmt)
ax.tick_params(axis='x', rotation=45)
# Label axes
ax.set_xlabel('Time [HH:MM:SS]')
ax.set_ylabel('Date [Y/m/d]')
fig.suptitle('Tweet timeline')
plt.show()
# We need date and time to be timestamps (i.e. a number, not a datetime object) so we can operate with them
# Chose directly to do matplotlib timestamps as we can later format the axes like we did before
date_timestamp = mdates.date2num(date_dt)
# Now we need to make the time (hours/minutes/second) into matplotlib timestamps too
# However matplotlib timestamps only works with full datetime timestamps(not just hours)
# Get the full datetime timestamps
dates = pd.to_datetime(df['created']).dt.strftime('%Y/%m/%d %H:%M:%S').to_numpy()
# Make a datetime object like previously
date_and_time_dt = np.array([datetime.strptime(x, '%Y/%m/%d %H:%M:%S').replace(tzinfo=timezone.utc) for x in dates])
# Make them into matplotlib timestamps
date_and_time_timestamp = mdates.date2num(date_and_time_dt)
# Take out the 'year/month/day' part so we can keep the 'hour/minute/second' part
# Now we have the hour information in matplotlib timestamps
time_timestamp = date_and_time_timestamp - date_timestamp
# Convert the cartesian x and y coordinates into polar coordinates
r = np.sqrt(time_timestamp ** 2 + date_timestamp ** 2)
# We need the percentages of 24 hours for theta
# Calculate what one day is in matplotlib timestamps
delta_one_day_plt = mdates.date2num(datetime(2016, 1, 2)) - mdates.date2num(datetime(2016, 1, 1))
# Divide each hour/minute/second by the max amount in a day, and transform it into radians with 2 * PI
theta = (time_timestamp / delta_one_day_plt) * 2 * np.pi
# Initiate polar figure
fig, ax = plt.subplots(figsize=(6, 6), subplot_kw={'projection': 'polar'}, layout='tight')
ax.scatter(theta, r, alpha=0.3)
# Make the labels go clockwise
ax.set_theta_direction(-1)
# Place Zero at Top
ax.set_theta_offset(np.pi/2)
# Set the circumference ticks
ax.set_xticks(np.linspace(0, 2 * np.pi, 24, endpoint=False))
# Set the label names
ticks = np.arange(0, 24, 1)
ax.set_xticklabels(ticks)
ax.set_xlabel('Date [Y/m/d] & Time [HH]')
# Set y lim so that it focuses on the dates in 2016
ax.set_ylim([mdates.date2num(datetime(2016, 1, 1)), mdates.date2num(datetime(2017, 1, 1))])
# Set y ticks so that it is in that 'gap' in the data and doesn't cover the points
ax.set_rlabel_position(140)
# Format y-axis for dates
y_fmt = mdates.DateFormatter('%Y/%m/%d')
ax.yaxis.set_major_formatter(y_fmt)
fig.suptitle('Tweet timeline')
plt.show()
Answered By - just_another_profile
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.