Issue
I have a FastAPI server that is listening to an endpoint, after receiving any post request, it will use Scrapy to grab some data depending on that data it's gotten from post request.
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
class Request(BaseModel):
someIDs: List[str]
process = CrawlerProcess(get_project_settings())
app = FastAPI()
@app.post("/")
def home(request: Request):
process.crawl('rt_criteria', ids=request.someIDs)
process.start() # the script will block here until the crawling is finished
return {"crawled": True}
# uvicorn main:app --reload
This code will run for the first time as I expect, but for the second time, I will get
twisted.internet.error.ReactorNotRestartable
error on:
process.start()
Where should I write this and How can I fix the error?
Solution
I solved this problem using Background tasks in FastAPI.
(I think it's using multiprocess under the hood.)
from fastapi import FastAPI, BackgroundTasks
# *** Not changed codes ***
@app.post("/")
async def home(request: Request, bt: BackgroundTasks):
process.crawl('rt_criteria', mid=request.movieIDs)
# Changed line below using Background tasks
bt.add_task(process.start, stop_after_crawl=False)
return {"crawled": True}
# uvicorn main:app --reload
Answered By - Arsham Arya
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.