Issue
I've made this small bot which processes through a list of search parameters. It works fine until there are several results on the page: product_prices_euros
gives a list of items where half are empty. So when I concatenate with product_prices_cents
, I have an output like the following:
'price' : '',76
for half results. Is there a simple way to prevent empty items for being collected? My output for product_prices_euros
looks like:
[' 1', ' ', ' 2', ' ', ' 2', ' ', ' 1', ' ', ' 1', ' ', ' 1', ' ', ' 2', ' ']
I'd like to keep only '1', '2', etc...
Here is what looks like CSS. There might be something on this side :
< span class="product-pricing__main-price" >
2
< span class="cents" >,79€< /span >
< /span >
And my code:
def start_requests(self):
base_url="https://new.carrefour.fr/s?q="
test_file = open(r"example", "r")
reader = csv.reader(test_file)
for row in reader:
if row:
url = row[0]
absolute_url = base_url+url
print(absolute_url)
yield scrapy.Request(absolute_url, meta={'dont_redirect': True, "handle_httpstatus_list": [302, 301]}, callback=self.parse)
def parse(self, response):
product_name = response.css("h2.label.title::text").extract()
product_packaging = response.css("div.label.packaging::text").extract()
product_price_euros = response.css("span.product-pricing__main-price::text").extract()
product_price_cents = response.css("span.cents::text").extract()
for name, packaging, price_euro, price_cent in zip(product_name, product_packaging, product_price_euros, product_price_cents):
yield { 'ean' : response.css("h1.page-title::text").extract(), 'name': name+packaging, 'price': price_euro+price_cent}
Any idea? :)
Solution
If you just filter the empty euro elements, how could you match them to their proper cents?
First, IMHO I think it would be easier if you loop over the products to collect their data. Eg.
for product in response.css('.product-list__item'):
name = product.css("h2.label.title::text").extract()
# ...
Thus, you could get the prices and cents like this:
>>> product.css('.product-pricing__main-price ::text')
['2', ',99€']
>>> ''.join(product.css('.product-pricing__main-price ::text').getall())
'2,99€'
Answered By - Thiago Curvelo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.