Issue
I have pipeline which looks like class with some methods. In each method I process some data. Example:
class Pipeline:
def load_users(self):
pass
def load_sessions(self):
pass
Should I initialize new spark session in every method with custom config? Or better to initialize its once in __init__
method?
Solution
You can live with doing this once up front and changing Spark properties as you go through your various Actions / Pipelines, using spark.conf.set("prop", 'val'). That is how most do and it there are few examples to be found to the contrary.
If you want better insight, then from the master himself: How many SparkSessions can a single application have?. This adds some insights which one could consider in relation to your question. Question is if you really need to consider this.
Answered By - thebluephantom
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.