Issue
Im new to python and scrapy. Im having trouble joining the base url to the srapped link. Iv tried a number of suggestions but probably executing it incorrectly
def parse(self, response):
for ad_links in response.xpath('//div[@class="view"][1]//a'):
yield {
'title': item.xpath('text()').extract(),
relative_url = item.xpath('@href').extract(),
'link': response.urljoin(relative_url),
}
Any suggestions would be really appreciated Thanks
Solution
You cannot instanciate a variable inside the dictionary you are yielding, it makes no sense.
And be sure to understand the difference between extract() and extract_first(), I have the feeling that extract_first is the method to use here. See documentation.
What is this item variable ? Should be ad_links right ?
Try this :
def parse(self, response):
for ad_links in response.xpath('//div[@class="view"][1]//a'):
relative_url = ad_links.xpath('@href').extract_first()
yield {
'title': ad_links.xpath('text()').extract_first(),
'link': response.urljoin(relative_url),
}
Answered By - Corentin Limier
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.