Issue
Using Scrapy, I'm trying to scrape data of tag <script type="application/ld+json">....
import json
class TestSpider(scrapy.Spider):
name = 'content'
start_urls = ['https://www.maserati.com/us/en/models/ghibli']
def parse(self, response):
for content in response.xpath('(//script[@type="application/ld+json"])/text()'):
data = json.loads(content)
yield {
'name': data['name'],
}
next_page = response.css('li.next a::attr("href")').get()
if next_page is not None:
yield response.follow(next_page, self.parse)
However, I'm not getting the test1.jl file that I was expecting once after writing, scrapy runspider test_spider.py - O test1.jl
in the terminal
I just want the name for a start to know how it works.
Image and website link for inspection are given below:
Image that shows the javascript tag and the name property inside that I want to yield
Image of my code and the code in the terminal
https://www.maserati.com/us/en/models/ghibli
Solution
You were so close...just missing getall()
import scrapy
import json
class TestSpider(scrapy.Spider):
name = 'content'
start_urls = ['https://www.maserati.com/us/en/models/ghibli']
def parse(self, response):
for content in response.xpath('(//script[@type="application/ld+json"])/text()').getall():
data = json.loads(content)
yield {
'name': data['name'],
}
next_page = response.css('li.next a::attr("href")').get()
if next_page is not None:
yield response.follow(next_page, self.parse)
(I don't see any "next" button though)
Answered By - SuperUser
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.