Issue
I am a python and scrapy beginner currently trying to get the ranking for every language / game combination on https://www.twitchmetrics.net/channels/viewership
However, I can't get scrapy to follow the links. I always get an 'HtmlResponse' object has no attribute 'follow_all' - error.
def parse(self, response):
all_channels = response.xpath('//h5')
language_page_links = response.xpath(
'//div[@class="mb-4"][1]//a//@href').getall()
for i, channel in enumerate(all_channels, start=1):
il = ItemLoader(item=LeaderboardItem(), selector=channel)
il.add_xpath('channel_id', './text()')
il.add_value('rank_mostwatched_all_all', i)
yield il.load_item()
yield from response.follow_all(language_page_links, self.parse)
In the last line I will be using a different parser once I get the link-following working. I also tried the example scraper from the scrapy documentation for which I get the exact same error:
class AuthorSpider(scrapy.Spider):
name = 'author'
start_urls = ['http://quotes.toscrape.com/']
def parse(self, response):
author_page_links = response.css('.author + a')
yield from response.follow_all(author_page_links, self.parse_author)
pagination_links = response.css('li.next a')
yield from response.follow_all(pagination_links, self.parse)
def parse_author(self, response):
def extract_with_css(query):
return response.css(query).get(default='').strip()
yield {
'name': extract_with_css('h3.author-title::text'),
'birthdate': extract_with_css('.author-born-date::text'),
'bio': extract_with_css('.author-description::text'),
}
What am I missing here?
Solution
Documentation shows that follow_all is new method avaliable only in version 2.0.
You may have to update scrapy
pip install --update scrapy
Answered By - furas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.