Issue
scrapy shell 'https://www.blibli.com/promosi/samsung-mobilephones-tablet?appsWebview=true'
fetch('https://www.blibli.com/promosi/samsung-mobilephones-tablet?appsWebview=true')
response.css('div.productset-carousel-mobile__block-item item')
[]
Description: I'm trying to fetch name, price of products mentioned in the url. As to get the raw data of the div class = 'productset-carousel-mobile__block-item item'. I'm writing response.css('div.productset-carousel-mobile__block-item item') But every time it gives empty list or goes to next line of terminal.
Now I Don't know where I'm wrong. Right now I learning scrapy from a youtube tutorial.
All suggestions and links to refer for clearing this concept are warmly accepted.
Solution
The content of that site are dynamic, so you can't access them using xhr. However, there is an api available containing the same stuff you are after. The following is how you can scrape the product names and the categories they belong to from landing page. Feel free to include other relevant fields.
import scrapy
class BliBliSpider(scrapy.Spider):
name = 'blibli'
start_urls = ['https://www.blibli.com/backend/content/promotions/samsung-mobilephones-tablet']
def parse(self, response):
for item in response.json()['data']['components']:
if not item['name']=='PRODUCT_CAROUSEL':continue
for container in item['parameters']:
cat_name = container['title']
for product in container['products']:
yield {"category":cat_name,"product name":product['name']}
Answered By - SIM
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.