Issue
I am new in Scrapy and I am trying to crawl this page and get the prices from the items, the problem is that scrapy is returning the values unordered and I don't know why.
This is my simple code
import scrapy
from ..items import AmazonItem
from scrapy.http import Request
import time
class QuotesSpider(scrapy.Spider):
name = "main"
def start_requests(self):
urls = [
'https://www.amazon.com/best-sellers-movies-TV-DVD-Blu-ray/zgbs/movies-tv/ref=zg_bs_nav_0',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
# amazon = AmazonItem()
ol_response = response.xpath('//ol[@id="zg-ordered-list"]/li')
for number_ra in range(0,50):
response_div = ol_response[number_ra]
price = response_div.css(".p13n-sc-price::text").extract()
item_name = response_div.xpath("span/div/span/a/div/text()").get().strip()
link = response_div.xpath("span/div/span/a").attrib['href'].split('/')[3].split('?')[0]
print("({}) {} , PRICE: {}".format(number_ra+1,item_name,price))
print(link+"\n")
The name and the id are in the correct order but not the prices.
Thanks, guys
Solution
You are doing it wrong way
You should iterate over each Item one by one
def parse(self, response):
for item in response.xpath('//ol[@id="zg-ordered-list"]/li'):
price = item.css(".p13n-sc-price::text").get()
item_name = item.css(".p13n-sc-truncate.p13n-sc-line-clamp-1::text").get()
link = response.urljoin(item.css(".a-link-normal::attr(href)").get())
print("{} , PRICE: {}".format(item_name,price))
Answered By - Umair Ayub
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.