Sunday, November 7, 2021

[FIXED] (null) entry in command string exception in saveAsTextFile() on Pyspark

November 07, 2021 apache-spark, jupyter-notebook, pyspark, windows No comments

Issue

I am working in PySpark on a Jupyter notebook (Python 2.7) in windows 7. I have an RDD of type pyspark.rdd.PipelinedRDD called idSums. When attempting to execute idSums.saveAsTextFile("Output"), I receive the following error:

Py4JJavaError: An error occurred while calling o834.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 33.0 failed 1 times, most recent failure: Lost task 1.0 in stage 33.0 (TID 131, localhost): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\seride\Desktop\Experiments\PySpark\Output\_temporary\0\_temporary\attempt_201611231307_0033_m_000001_131\part-00001

There shouldn't be any problem with the RDD object, in my opinion, because I'm able to execute other actions without error, e.g. executing idSums.collect() produces the correct output.

Furthermore, the Output directory is created (with all subdirectories) and the file part-00001 is created, but it is 0 bytes.

Solution

You are missing winutils.exe a hadoop binary . Depending upon x64 bit / x32 bit System download the winutils.exe file & set your hadoop home pointing to it.

1st way :

Download the file
Create hadoop folder in Your System, ex C:
Create bin folder in hadoop directory, ex : C:\hadoop\bin
paste winutils.exe in bin, ex: C:\hadoop\bin\winutils.exe
In User Variables in System Properties -> Advance System Settings

Create New Variable Name: HADOOP_HOME Path: C:\hadoop\

2nd Way :

You can set hadoop home directly in Your Java Program with the following Command like this :

System.setProperty("hadoop.home.dir","C:\hadoop" );

Answered By - Harpreet Varma

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 7, 2021

[FIXED] (null) entry in command string exception in saveAsTextFile() on Pyspark

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels