Issue
I have a requirement, where I need to read multiple Json file present in s3 bucket, I am using below code
path = "s3://some-bucket/some-key/*.json"
df = spark.read.json(path)
Now, I have Json data in this Dataframe, but how can I convert it back to exact same Json format which was there while reading it ? I know I can convert this spark df to pandas df, but i tried all the options of pandas orient parameter, none of them giving me same exact Json.
Solution
df.write.json will write the dataframe to json file. Please share more details on json format if below is not helpful.
>>> df = spark.read.json('/home/tm/json_sample.json')
>>> df.show()
+--------------------+
| employee|
+--------------------+
|[true, sonoo, 56000]|
+--------------------+
>>> df.write.json('/home/tm/json_sample1.json')
Data in json_sample.json
{"employee":{"married":true,"name":"sonoo","salary":56000}}
Data in output file (part-00000-d319741f-7fb1-416a-8906-78ebfc5a1df1-c000.json)
{"employee":{"married":true,"name":"sonoo","salary":56000}}
Answered By - Hegde
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.