Issue
I have written a code using playwright that returns html content. My question is if there is a method to call scrapy to read from this html content or scrapy reads only from urls ?
I will apreciate any answer from you.
Thanks !
Solution
I would suggest write that returned HTML content to a file and use this to scrape local file using:
import scrapy
import os
LOCAL_FILENAME = 'example.html'
LOCAL_FOLDER = 'html_files'
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
class ExampleSpider(scrapy.Spider):
name = "example"
start_urls = [
f"file://{BASE_DIR}/{LOCAL_FOLDER}/{LOCAL_FILENAME}"
]
Answered By - DontDownvote
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.