Issue
It is propably very trivial question but I am new to Scrapy. I've tried to find solution for my problem but I just can't see what is wrong with this code.
My goal is to scrap all of the opera shows from given website. Data for every show is inside one div with class "row-fluid row-performance ". I am trying to iterate over them to retrieve it but it doesn't work. It gives me content of the first div in each iteration(I am getting 19x times the same show, instead of different items).
import scrapy
from ..items import ShowItem
class OperaSpider(scrapy.Spider):
name = "opera"
allowed_domains = ["http://www.opera.krakow.pl"]
start_urls = [
"http://www.opera.krakow.pl/pl/repertuar/na-afiszu/listopad"
]
def parse(self, response):
divs = response.xpath('//div[@class="row-fluid row-performance "]')
for div in divs:
item= ShowItem()
item['title'] = div.xpath('//h2[@class="item-title"]/a/text()').extract()
item['time'] = div.xpath('//div[@class="item-time vertical-center"]/div[@class="vcentered"]/text()').extract()
item['date'] = div.xpath('//div[@class="item-date vertical-center"]/div[@class="vcentered"]/text()').extract()
yield item
Solution
Try to change the xpaths inside the for loop to start with .//
. That is, just put a dot in front of the double backslash. You can also try using extract_first()
instead of extract()
and see if that gives you better results.
Answered By - Tor Stava
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.