Issue
try to do in scrapy shell
>>>scrapy shell 'https://www.trendyol.com/trendyolmilla/cok-renkli-desenli-elbise-twoss20el0573-p-36294862'
>>> response.css("div.slick-slide img").xpath("@src").getall()
Output is :
['/Content/images/defaultThumb.jpg', '/Content/images/defaultThumb.jpg', '/Content/images/defaultThumb.jpg', '/Content/images/defaultThumb.jpg', '/Content/images/defaultThumb.jpg', 'https://cdn.dsmcdn.com/mnresize/415/622/ty124/product/media/images/20210602/12/94964657/64589619/1/1_org_zoom.jpg', 'https://cdn.dsmcdn.com/mnresize/415/622/ty124/product/media/images/20210602/12/94964657/64589619/1/1_org_zoom.jpg']
only collect one image but in provided link have 5 image. Please help me to out this problem. How to find all of the image src.
Solution
Explanation
Actually, you are trying to fetch the data from HTML tag which contains only one link. In order to grab all the link you have to fetch from script tag.
This will return json string which will be stored in text variable
text = response.xpath("//p/script[contains(@type,'application/ld+json')]/text()").extract_first()
Load it to convert into python dictionary
json_text = json.loads(text)
Now, pass the key json_text.get('image')
to get the images.
Code
Execute this code on scrapy. Output will give you all the 5 links
from scrapy import Request
class Trendyol(scrapy.Spider):
name = 'test'
def start_requests(self):
url = 'https://www.trendyol.com/trendyolmilla/cok-renkli-desenli-elbise-twoss20el0573-p-36294862'
yield Request(url=url, callback=self.parse)
def parse(self, response):
text = response.xpath("//p/script[contains(@type,'application/ld+json')]/text()").extract_first()
json_text = json.loads(text)
print(json_text.get('image'))
Answered By - Shivam
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.