Issue
I want to scrap the Production Co from a imdb movie but i dont have any idea how.
I dont know what kinda of information i need to retrive. And right now i only have to get the movie title.
This is my code:
# -*- coding: utf-8 -*-
"""
Created on Sun Jan 27 20:10:53 2019
@author: Razva
"""
import scrapy
from imdb2.items import Imdb2Item
class ThirdSpider(scrapy.Spider):
name = "imdbtestspider"
allowed_domains = ["imdb.com"]
start_urls = (
'http://www.imdb.com/chart/top',
)
def parse(self, response):
links = response.xpath('//tbody[@class="lister-list"]/tr/td[@class="titleColumn"]/a/@href').extract()
i =1
for link in links:
abs_url = response.urljoin(link)
#
url_next = '//*[@id="main"]/div/span/div/div/div[2]/table/tbody/tr['+str(i)+']/td[3]/strong/text()'
rating = response.xpath(url_next).extract()
if (i <= len(links)):
i=i+1
yield scrapy.Request(abs_url, callback = self.parse_indetail, meta={'rating' : rating})
def parse_indetail(self,response):
item = Imdb2Item()
#
item['title'] = response.xpath('//div[@class="title_wrapper"]/h1/text()').extract()[0][:-1]
return item
If someone can give me a tip, i would aprecciate it.
Solution
For name of Production Co try:
def parse_indetail(self,response):
item = Imdb2Item()
item['title'] = response.xpath('//div[@class="title_wrapper"]/h1/text()').extract()[0][:-1]
item['production'] = response.xpath('//h4[contains(text(), "Production Co")]/following-sibling::a/text()').get()
return item
Answered By - vezunchik
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.