Issue
I have been working on this for the past few hours, but cannot figure out what I'm doing wrong. When I run my xpath states using the selector in the scrapy shell, the statement works as expected. When I try to use the same statement in my spider, however, I get back an empty set. Does anyone know what I am doing wrong?
from scrapy.spider import Spider
from scrapy.selector import Selector
from TFFRS.items import Result
class AthleteSpider(Spider):
name = "athspider"
allowed_domains = ["www.tffrs.org"]
start_urls = ["http://www.tffrs.org/athletes/3237431/",]
def parse(self, response):
sel = Selector(response)
results = sel.xpath("//table[@id='results_data']/tr")
items = []
for r in results:
item = Result()
item['event'] = r.xpath("td[@class='event']").extract()
items.append(item)
return items
Solution
When viewed by the spider your url contains no content. To debug this kind of problems you should use scrapy.shell.inspect_response in parse method, use it like so:
from scrapy.shell import inspect_response
class AthleteSpider(Spider):
# all your code
def parse(self, response):
inspect_response(response, self)
then when you do
scrapy crawl <your spider>
you will get a shell from within your spider. There you should do:
In [1]: view(response)
This will display this particular response as it looks for this particular spider.
Answered By - Pawel Miech
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.