Issue
I have a web application where people can upload CSV file (no concurrent uploads only 1 upload per day) and there will be approximately 1000 rows in CSV file. This row is processed and updated in firestore database based on few conditions and we do not want to run this extracted rows in parallel as there might be problem with concurrency.
Each row processing takes approximately 1 second and hence job takes 15 minutes. This has to be done asynchronously. All our application is in GCP APP Engine and my python code looks as follows
app.py
@app.route('/batch', methods=['POST'])
def read_csv(**kwargs):
threading.Thread(target=iterate_csv_file, args=(
df, file_name, file_content)).start()
main.py
if __name__ == '__main__':
app.run(host='127.0.0.1', port=5000, debug=True)
app.yaml
runtime: python37
entrypoint: gunicorn -t 120 -b :$PORT main:app
service: my-test
instance_class: F4
automatic_scaling:
min_instances: 1
max_instances: 1000
handlers:
- url: /.*
secure: always
script: auto
I am getting following error after 7 minutes (processing after approx 400 rows)
Exception in thread Thread-58:
textPayload: "Traceback (most recent call last):
File "/layers/google.python.pip/pip/lib/python3.7/site-packages/firebase_admin/_user_mgt.py", line 837, in _make_request
return self.http_client.body_and_response(method, url, **kwargs)
File "/layers/google.python.pip/pip/lib/python3.7/site-packages/firebase_admin/_http_client.py", line 125, in body_and_response
resp = self.request(method, url, **kwargs)
File "/layers/google.python.pip/pip/lib/python3.7/site-packages/firebase_admin/_http_client.py", line 117, in request
resp.raise_for_status()
File "/layers/google.python.pip/pip/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://identitytoolkit.googleapis.com/v1/projects/my-project/accounts"
Now I have seen many tutorials to use celery and RabbitMQ. For my use case is it required to use them or simple background thread should work? Why am i getting error when I use background thread. Is this error related to flask or GCP or some timeout. I was navigating through website(several APIs would have called) when this background thread was running in GCP. I have followed following tutorial and came up with this code https://pastebin.com/vnypfpU7
Solution
You can't create thread in App Engine standard, the runtime is not designed for that. The instance can be offloaded anytime if no request are currently processed. It's your case because there is no longer request in progress, just a background thread out of request handling context.
And even if the min instance is set to 1, the a new one can be created and the old one deleted, the "1" is respected because you always have at least 1 instance up to serve the traffic.
To achieve this, you need to create a Cloud Task that call back your App Engine. This time, the process is perform inside a "request context" created by Cloud Task. It's not a a user request, but still a request that prevent instance offload in the middle of the thread process.
Answered By - guillaume blaquiere
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.