Issue
I'm trying to run PySpark on my Jupyter Notebook locally on a server not connected to the internet. I installed PySpark and Java using the following:
conda install pyspark-3.3.0-pyhd8ed1ab_0.tar.bz2
conda install openjdk-8.0.332-h166bdaf_0.tar.bz2
When I do a !java -version
in my notebook, I get
openjdk version "1.8.0_332"
OpenJDK Runtime Environment (Zulu 8.62.0.19-CA-linux64) (build 1.8.0_332-b09)
OpenJDK 64-Bit Server VM (Zulu 8.62.0.19-CA-linux64) (build 25.332-b09, mixed mode)
When I run !which java
, I get
/root/anaconda3/bin/java
My code is as follows.
import os
os.environ['SPARK_HOME'] = "/root/anaconda3/pkgs/pyspark-3.3.0-pyhd8ed1ab_0/site_packages/pyspark"
os.environ['JAVA_HOME'] = "/root/anaconda3"
os.environ['PYSPARK_SUBMIT_ARGS'] = "--master local[2] pyspark-shell"
from pyspark import SparkConf, SparkContext
conf = SparkConf().set('spark.driver.host','127.0.0.1')
sc = SparkContext(master='local', appName='Test', conf=conf)
The error I got was (a snippet of it because I'm manually typing it here):
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$3(SparkSubmitArguments.scala:157)
...
Caused by: java.net.UnknownHostException: abc: abc: Name or service not known
...
Caused by: java.net.UnknownHostException: abc: Name or service not known
...
Runtime Error: Java gateway process exited before sending its port number
"abc" is my server's hostname. What am I missing here?
Solution
I found out what the problem was.
Based on the error message java.net.UnknownHostException: abc: abc: Name or service not known
, I suspected Java did not recognize my server hostname abc
. So I added it to /etc/hosts
under the loopback IP 127.0.0.1
, and now I can run pyspark.
Answered By - Rayne
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.