Issue
Scraping a website that has multiple products on the same page, some that I don't want to know the prices of. So I wanted to first see the product category to then get the price listed.
The website code looks like this:
<section class="products_results">
<span something I don't want>...</span>
<section class="category">
<span>Clothes</span>
<div something I don't want>...</div>
<section class="search_result_price">
<section>
<span something I don't want>...</span>
<span class="price">149.99</span>
</section>
</section>
I already know how to get to the category part with my own code, but I'm completely stuck on the other part.
for products in soup.find_all(class_='category'):
category = (products.text)
if category == 'Clothes':
price = (theoretical piece of code)
How can I get to the specific price tag within this parent <section>
tag?
Solution
You are close to your goal but be aware that products.text
will give you the whole section text, better use products.span.text
to get the category text only.
To get the price info, simply find the span with class="price"
and check if it is available or not to avoid errors:
price = products.find(class_='price').text if products.find('span', class_='price') else None
Example
from bs4 import BeautifulSoup
html='''
<section class="products_results">
<span something I don't want>...</span>
<section class="category">
<span>Clothes</span>
<div something I don't want>...</div>
<section class="search_result_price">
<section>
<span something I don't want>...</span>
<span class="price">149.99</span>
</section>
</section>'''
soup = BeautifulSoup(html, 'html.parser')
for products in soup.find_all('section', class_='category'):
category = products.span.text
if category == 'Clothes':
price = products.find(class_='price').text if products.find('span', class_='price') else None
print(price)
Output
149.99
As alternative an approach that is more lean, creates a structured output that is easy to process and deals with a list of permitted categories:
from bs4 import BeautifulSoup
html='''
<section class="products_results">
<span something I don't want>...</span>
<section class="category">
<span>Clothes</span>
<div something I don't want>...</div>
<section class="search_result_price">
<section>
<span something I don't want>...</span>
<span class="price">149.99</span>
</section>
<span something I don't want>...</span>
<section class="category">
<span>Shoes</span>
<div something I don't want>...</div>
<section class="search_result_price">
<section>
<span something I don't want>...</span>
<span class="price">90.99</span>
</section>
</section>'''
soup = BeautifulSoup(html, 'html.parser')
data = []
c_list = ['Clothes','Shoes']
for products in soup.select(f"section.category:-soup-contains({','.join(c_list)})"):
data.append({
'category' : products.span.text,
'price' : products.find(class_='price').text if products.find('span', class_='price') else None
})
data
Output
[{'category': 'Clothes', 'price': '149.99'},
{'category': 'Shoes', 'price': '90.99'}]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.