Issue
I'm trying to learn basic web-scraping and have came across an issue I can't figure out out. Basically I found a site that lists a retail price, and sale price but the both have the class of "price"
Looking for some pointers to get me back on track. Thanks.
What I would like to do is be able to extract with the results as follows:
{'model': <'model_number'>, 'retail_price': <'xxx.xx'>, 'sale_price': <'xxx.xx'>}
I can get the first price which is the retail price since it comes first, but I'm having issues being able to figure out how to extract the sale price.
Sample of what I'm attempting to extract:
Code so far to get the model and retail price:
import httpx
from selectolax.parser import HTMLParser
url = "https://hvacdirect.com/ductless-mini-splits/single-zone-ductless-mini-splits/filter/wall-mounted/aciq.html"
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 OPR/104.0.0.0"}
#Get page html
resp = httpx.get(url, headers=headers)
html = HTMLParser(resp.text)
#Get product list
products = html.css("div.products.wrapper.grid.products-grid ol li")
for product in products:
item = {
"name":product.css_first(".product-item-sku").text().strip(),
"retail_price":product.css_first(".price").text().strip(),
}
print(item)
Solution
You can simply use css selectors that are more specific and only apply to one price and don't apply to the other... Here's a quick example adding one small modification to your code.
#Get product list
products = html.css("div.products.wrapper.grid.products-grid ol li")
for product in products:
item = {
"name":product.css_first(".product-item-sku").text().strip(),
"retail_price":product.css_first(".old-price .price").text().strip(),
"sale_price":product.css_first(".price-bundle-product .price").text().strip(),
}
print(item)
Notice how I utilize the .old-price
parent css selector to differentiate between this price and the other... Then I utilize the .price-bundle-product
parent selector to differentiate between the current price (as I noticed that in your image this is unique to this price and is a perfectly acceptable selector to use to differentiate between this price and the other price...
You can learn more about specific css selectors here. Or just even use this as a cheat sheet to help you select appropriate css selectors for specific use cases. There's so many other ways of doing this, this is just one with minimal changes and of course if the website changes to not include these css selectors then you will have to modify your code yet again... For your use case, it's probably just going to be simple to use these more specific css selectors in the parents to differentiate.
Answered By - Sharp Dev
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.