Issue
I am scraping the HTML
of the h1
tag. The selector
only targets the h1
tag but when I print it, it also prints an unnecessary HTML
with the h1
tag.
import requests
from scrapy.selector import Selector
r = requests.get('https://www.catholicgallery.org/mass-reading/310122/')
resp = Selector(text=r.text)
h1 = resp.xpath('//h1[@class="tdb-title-text"]').get()
print(h1)
Solution
It looks like it is affected by this issue:
scrapy/parsel: HTML code extraction from node is not working #228
It is reported that downgrading libxml
to 2.9.10 can solve this issue.
Answered By - Georgiy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.