Issue
I am brand new to scrapy and am experimenting based on the docs / tutorial. I am writing a simple bot to just scrape hacker news and ultimately want to extract stories with only a certain amount of points. I have come to a point where my loop just fills the same story title / link for all results on pages 1 and 2. How do I actually get it to check every single story instead of just the first ones on each page? The code is as follows:
import scrapy
class ArticlesSpider(scrapy.Spider):
name = 'articles'
start_urls = [
'https://news.ycombinator.com',
'https://news.ycombinator.com/news?p=2'
]
def parse(self, response):
link = response.css('tr.athing')
for website in link:
yield {
'title': link.css('tr.athing td.title a.storylink::text').get(),
'link': link.css('tr.athing td.title a::attr(href)').get()
}
The output in my console is the title and link in dict form but the same exact one (30 times) per page. What am I doing wrong?
Solution
inside cycle you need to use website.css..
not link.css...
:
it should like
def parse(self, response):
link = response.css('tr.athing')
for website in link:
yield {
'title': website.css('td.title a.storylink::text').get(),
'link': website.css('td.title a::attr(href)').get()
}
Answered By - Georgiy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.