Issue
I am trying to get data from a website.
Everything seems to be correct (xpath was tested on the shell):
>>> scrapy shell "https://stopcovid19.fr/"
>>> for cat in response.xpath("//ul[@class='level0 submenu']/li/a"):
{
'name': cat.xpath("./span/text()").get(),
'link': cat.xpath("./@href").get(),
}
Here is the code:
import scrapy
class ToScrapeSpiderXPath(scrapy.Spider):
name = 'categories'
start_urls = ['https://stopcovid19.fr']
def parse(self, response):
for cat in response.xpath("//ul[@class='level0 submenu']/li/a"):
yield {
'name': cat.xpath("./span/text()").get(),
'link': cat.xpath("./@href").get(),
}
But when I try to get result on a json file with the following code, the file is empty.
scrapy crawl categories -O categories.json
Could you help me? Sorry in advance, this is my first program...
Solution
You forget to add contains() function to your xpath:
//ul[contains(@class, 'level0 submenu')]
try like that:
for cat in response.xpath("//ul[contains(@class, 'level0 submenu')]/li/a"):
...
so spider looks like:
import scrapy
class ToScrapeSpiderXPath(scrapy.Spider):
name = 'categories'
start_urls = ['https://stopcovid19.fr']
def parse(self, response, **kwargs):
for cat in response.xpath("//ul[contains(@class, 'level0 submenu')]/li/a"):
yield {
'name': cat.xpath("./span/text()").get(),
'link': cat.xpath("./@href").get(),
}
and run the script like:
scrapy crawl categories -o file.json
++++ EDIT ++++ the code is running well but the spider was not saved in the right file... Thanks for your help!!
Answered By - Vova
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.