Issue
I am currently working through the book Hands On Machine Learning and am trying to replicate a visualization where we plot the lat and lon co-ordinates on a scatter plot of San Diego. I have taken the plot code from the book which uses the code below (matplotlib method). I would like to replicate the same visualization using plotnine. Could someone help me with the translation.
matplotlib method
# DATA INGEST -------------------------------------------------------------
# Import the file from github
url = "https://raw.githubusercontent.com/ageron/handson-ml2/master/datasets/housing/housing.csv" # Make sure the url is the raw version of the file on GitHub
download = requests.get(url).content
# Reading the downloaded content and turning it into a pandas dataframe
housing = pd.read_csv(io.StringIO(download.decode('utf-8')))
# Then plot
import matplotlib.pyplot as plt
# The size is now related to population divided by 100
# the colour is related to the median house value
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True)
plt.legend()
plt.show()
plotnine method
from plotnine import ggplot, geom_point, aes, stat_smooth, scale_color_cmap
# Lets try the same thing in ggplot
(ggplot(housing, aes('longitude', 'latitude', size = "population", color = "median_house_value"))
+ geom_point(alpha = 0.1)
+ scale_color_cmap(name="jet"))
Solution
If your question was the colour mapping, then you were close: just needed cmap_name='jet'
instead of name='jet'
.
If it is a broader styling thing, below is close to what you had with matplotlib.
matplotlib method
p = (ggplot(housing, aes(x='longitude', y='latitude', size='population', color='median_house_value'))
+ theme_matplotlib()
+ geom_point(alpha=0.4)
+ annotate('text', x=-114.6, y=42, label='population', size=8)
+ annotate('point', x=-115.65, y=42, size=5, color='#6495ED', fill='#6495ED', alpha=0.8)
+ labs(x=None, color='Median house value')
+ scale_y_continuous(breaks=np.arange(34,44,2))
+ scale_color_cmap(cmap_name='jet')
+ scale_size_continuous(range=(0.05, 6))
+ guides(size=False)
+ theme(
text = element_text(family='DejaVu Sans', size=8),
axis_text_x = element_blank(),
axis_ticks_minor=element_blank(),
legend_key_height = 34,
legend_key_width = 9,
)
)
p
I am not sure to what capacity it's possible to modify the formatting of colour bar in plotnine. If others have additional ideas, I would be most interested - I think the matplotlib colour bar looks nicer.
Answered By - brb
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.