Issue
I am trying to scrape the main title of this page: https://patents.google.com/patent/CN102093389B/en ("Duplex and oxygen bridge heterlcyclic ring anabasine compound and preparation method thereof") with Scrapy and it is not possible. I am trying extract it with css. The same css selector in puppeteer works fine and extract the main header but in Scrapy gives None. The code have written is this
import scrapy
class GooglepatentsspiderSpider(scrapy.Spider):
name = 'googlePatentsSpider'
allowed_domains = ['patents.google.com']
start_urls = ['https://patents.google.com/patent/CN102093389B/en']
def parse(self, response):
title = response.css('h1#title::text').get()
yield {
'title': title
}
Solution
Your css path is incorrect. Try this, response.css('span[itemprop="title"]::text').get()
Answered By - Shivam
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.