Issue
I want to scrape all monitor item from the site https://www.startech.com.bd. But The problem arise when I run my spider it returns only 60 result. Here is my code, which doesn't work right:
import scrapy
import time
class StartechSpider(scrapy.Spider):
name = 'startech'
allowed_domains = ['startech.com.bd']
start_urls = ['https://www.startech.com.bd/monitor/']
def parse(self, response):
monitors = response.xpath("//div[@class='p-item']")
for monitor in monitors:
item = monitor.xpath(".//h4[@class = 'p-item-name']/a/text()").get()
price = monitor.xpath(".//div[@class = 'p-item-price']/span/text()").get()
yield{
'item' : item,
'price' : price
}
next_page = response.xpath("//ul[@class = 'pagination']/li/a/@href").get()
print (next_page)
if next_page:
yield response.follow(next_page, callback = self.parse)
Any help is much appreciated!
Solution
//ul[@class = 'pagination']/li/a/@href
selects 10 items/pages at once but you have to select unique meaning only the next page.The following xpath expression grab the right pagination.
Code:
next_page = response.xpath("//a[contains(text(), 'NEXT')]/@href").get()
print (next_page)
if next_page:
yield response.follow(next_page, callback = self.parse)
Output:
2022-11-26 01:45:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.startech.com.bd/monitor?page=19> (referer: https://www.startech.com.bd/monitor?page=18)
2022-11-26 01:45:06 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.startech.com.bd/monitor?page=19>
{'item': 'HP E27q G4 27 Inch 2K QHD IPS Monitor', 'price': '41,000৳'}
None
2022-11-26 01:45:06 [scrapy.core.engine] INFO: Closing spider (finished)
2022-11-26 01:45:06 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 6702,
'downloader/request_count': 19,
'downloader/request_method_count/GET': 19,
'downloader/response_bytes': 546195,
'downloader/response_count': 19,
'downloader/response_status_count/200': 19,
'elapsed_time_seconds': 9.939978,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 11, 25, 19, 45, 6, 915772),
'httpcompression/response_bytes': 6200506,
'httpcompression/response_count': 19,
'item_scraped_count': 361,
Answered By - Md. Fazlul Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.