Issue
I'd like to select some points on a plot (e.g. from box_select
or lasso_select
) and retrieve them in a Jupyter notebook for further data exploration. How can I do that?
For instance, in the code below, how to export the selection from Bokeh to the notebook? If I need a Bokeh server, this is fine too (I saw in the docs that I could add "two-way communication" with a server but did not manage to adapt the example to reach my goal).
from random import random
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models.sources import ColumnDataSource
output_notebook()
x = [random() for x in range(1000)]
y = [random() for y in range(1000)]
s = ColumnDataSource(data=dict(x=x, y=y))
fig = figure(tools=['box_select', 'lasso_select', 'reset'])
fig.circle("x", "y", source=s, alpha=0.6)
show(fig)
# Select on the plot
# Get selection in a ColumnDataSource, or index list, or pandas object, or etc.?
Notes
- I saw some related questions on SO, but most answers are for outdated versions of Bohek, 0.x or 1.x, I'm looking for an answer for v>=2.
- I am open for solutions with other visualization libraries like altair, etc.
Solution
If you have a bokeh server running, you can access the selection indices of a datasource via datasource.selection.indices
. The following is an example how you would do this (modified from the official Embed a Bokeh Server Into Jupyter example):
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.io import show, output_notebook
from bokeh.sampledata.sea_surface_temperature import sea_surface_temperature
output_notebook()
df = sea_surface_temperature.copy()[:100]
source = ColumnDataSource(data=df)
def bkapp(doc):
plot = figure(x_axis_type='datetime', y_range=(0, 25), tools="lasso_select",
y_axis_label='Temperature (Celsius)',
title="Sea Surface Temperature at 43.18, -70.43")
plot.circle('time', 'temperature', source=source)
doc.add_root( plot)
show(bkapp)
After you selected something, you could get the selected data as following:
selected_data = df.iloc[source.selected.indices]
print(selected_data)
Which should show you the selected values.
While out of scope for this question, note that there is a disconnect between jupyter notebooks and the interactive nature of bokeh apps: This solution introduces state which is not saved by the jupyter notebook, so restarting it and executing all cells does not give the same results. One way to tackle this would be to persist the selection with pickle:
df = sea_surface_temperature.copy()[:100]
source = ColumnDataSource(data=df)
if os.path.isfile("selection.pickle"):
with open("selection.pickle", mode="rb") as f:
source.selected.indices = pickle.load(f)
... # interactive part
with open("selection.pickle", mode="wb") as f:
pickle.dump(source.selected.indices, f)
Answered By - syntonym
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.