Issue
I am using AWS glue to create ETL workflow, where I am fetching the data from the API and loading it into RDS. In AWS Glue, I used pyspark script. In the same script, I have used the 'aiohttp' and 'asyncio' modules of python to call my API asynchronously. But in AWS glue it is throwing me an error that Module Not found for the only aiohttp.
I have already tried with different versions of aiohttp module and tested in the glue job but still throwing me the same error. Can someone please help me with this topic?
Solution
Glue 2.0
AWS Glue version 2.0 lets you provide additional Python modules or different versions at the job level. You can use the --additional-python-modules
job parameter with a list of comma-separated Python modules to add a new module or change the version of an existing module.
Also, within the --additional-python-modules
option you can specify an Amazon S3 path to a Python wheel module.
This link to official documentation lists all modules already available. If you need a different version or need one to be installed, it can be specified in the parameter mentioned above.
Glue 1.0 & 2.0
You can zip the python library, upload it so s3 and specify the path as --extra-py-files
job parameter.
See link to official documentation for more information.
Answered By - Rohit Pahuja
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.