Monday, January 24, 2022

[FIXED] Scrapy - Not Iterating Through

January 24, 2022 python, scrapy No comments

Issue

I am brand new to scrapy and am experimenting based on the docs / tutorial. I am writing a simple bot to just scrape hacker news and ultimately want to extract stories with only a certain amount of points. I have come to a point where my loop just fills the same story title / link for all results on pages 1 and 2. How do I actually get it to check every single story instead of just the first ones on each page? The code is as follows:

import scrapy

class ArticlesSpider(scrapy.Spider):
    name = 'articles'
    start_urls = [
        'https://news.ycombinator.com',
        'https://news.ycombinator.com/news?p=2'
    ]

    def parse(self, response):
        link = response.css('tr.athing')
        for website in link:
            yield {
                'title': link.css('tr.athing td.title a.storylink::text').get(),
                'link':  link.css('tr.athing td.title a::attr(href)').get()
            }

The output in my console is the title and link in dict form but the same exact one (30 times) per page. What am I doing wrong?

Solution

inside cycle you need to use website.css.. not link.css... : it should like

    def parse(self, response):
        link = response.css('tr.athing')
        for website in link:
            yield {
                'title': website.css('td.title a.storylink::text').get(),
                'link':  website.css('td.title a::attr(href)').get()
            }

Answered By - Georgiy

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, January 24, 2022

[FIXED] Scrapy - Not Iterating Through

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels