Issue
I'm in the process of migrating current DataBricks Spark notebooks to Jupyter notebooks, DataBricks provides convenient and beautiful display(data_frame) function to be able to visualize Spark dataframes and RDDs ,but there's no direct equivalent for Jupyter(im not sure but i think its a DataBricks specific function), i tried :
dataframe.show()
But it's a text version of it ,when you have many columns it breaks , so i'm trying to find an alternative to display() that can render Spark dataframes better than show() functions. Is there any equivalent or alternative to this?
Solution
When you use Jupyter, instead of using df.show() use myDF.limit(10).toPandas().head(). And, as sometimes, we are working multiple columns it truncates the view. So just set your Pandas view column config to the max.
# Alternative to Databricks display function.
import pandas as PD
pd.set_option('max_columns', None)
myDF.limit(10).toPandas().head()
Answered By - AP-Big Data
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.