Issue
I am struggling on a website to get some information, I set up ROBOTSTXT_OBEY = False
but still doesnt retrieve any information, how to fix it?
start_urls = ['https://tienda.mercadona.es/search-results?query=leche%20entera']
def parse(self, response):
sample = response.css("div").get()
yield {'name':sample}
Thank you so much, as far as I see, probably they have something to forbid me when I do the request
Solution
The site you are trying to scrape is dynamically loaded with JavaScript. Vanilla Scrapy won't handle javascript by default but there are plugins that may help. A simple one that comes to mind is Scrapy-Playwright. Once configured properly it usually just requires adding DOWNLOAD_HANDLERS to the settings.py file like so:
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
You will then need to pass meta={"playwright":True}
as an argument within the scrapy Request.
Answered By - E Joseph
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.