Issue
I am a total noob using scrapy for the first time. I have set it up to get some information, but it always stops after 5 pages. I want it to scrape a lot more pages since at least 20 are available.
import scrapy
from myproject.items import EbaySold
class EbaySpider(scrapy.Spider):
name = 'EbaySold'
allowed_domains = ['www.ebay.com']
start_urls = ['https://www.ebay.com/b/Apple-Unlocked-Smartphones/9355/bn_599372?LH_Sold=1&mag=1&rt=nc&_dmd=1&_pgn=1&_sop=13']
def parse(self, response):
products = response.css('li.s-item')
product_item = EbaySold()
for product in products:
product_item['name'] = product.css('h3.s-item__title::text').get()
if product_item['name'] is None:
product_item['name'] = product.css('span.BOLD::text').get()
product_item['sold_price'] = product.css('span.POSITIVE::text').get()
product_item['date_sold'] = product.css('div.s-item__title-tag::text').get().replace('SOLD ', '')
yield product_item
next_page = response.css('a[type=next]').attrib['href']
if next_page is not None:
yield response.follow(next_page, callback=self.parse)
Solution
In your scrapy project settings.py
file. Make sure you have the following settings configured.
ROBOTSTXT_OBEY = False
COOKIES_ENABLED = True
DEFAULT_REQUEST_HEADERS = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-language': 'en-US,en;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'sec-fetch-site': 'same-origin',
'upgrade-insecure-requests': 1,
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
CONCURRENT_REQUESTS = 2 # small number
Then try running the spider again.
Answered By - alexpdev
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.