Issue
I am new to Scrapy & Python. Just following a tutorial for building a scraper.
I wrote the following code:
import scrapy
class PostsSpider(scrapy.Spider):
name = "posts"
allowed_domains = ['blog.scrapinghub.com']
start_urls = [
'https://blog.scrapinghub.com/'
]
def parse(self, response):
for post in response.css('div.post-item'):
yield {
'title': post.css('.post-header h2 a::text')[0].get(),
'date': post.css('.post-header a::text')[1].get(),
'author': post.css('.post-header a::text')[2].get()
}
next_page = response.css('a.next-posts-link::attr(href)').get()
if next_page is not None:
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse)
The code just executes without any error, but nothing is printed. On debug I see that it is not going inside the parse method. Help needed.
Solution
The source for your problem can not be derived from your code. I just executed your spider-code in my scrape-project to test it and for me it runs perfectly fine without a single change to it. I correctly get the title, date and author for all 17 pages of that blog printed to the terminal.
Answered By - carpa_jo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.