Issue
I'm trying to learn to work with Python and BeautifulSoup. As a project for myself I am scraping a recipe website and displaying certain items in a template to learn to work with it. The website is displaying meal prep time, calories and the amount of people who can eat from a recipe in a row as li in a div. There are 35 such div in a grid on the website. I want to only select the meal prep time from the div to store in a list. All of the li have the same class and no other attributes. How do I only select the li I need?
Below the HTML code of the page. There are 35 of these div, each with a different recipe.
<div class="column xxlarge-4 large-6 small-12 ">
<a role="link" aria-label="Recept: 'Tiramisu' met advocaat" data-testhook="recipe-card" title="Recept: 'Tiramisu' met advocaat" href="/allerhande/recept/R-R1196417/tiramisu-met-advocaat" class="display-card_root__o17AY card_root__VNG0M card_roundCorners__dYaFu display-card_anchor__cTFon" data-analytics="LINK_CLICK" data-analytics-meta="%7B%22component%22%3A%22recipe-search%22%2C%22href%22%3A%22%2Fallerhande%2Frecept%2FR-R1196417%2Ftiramisu-met-advocaat%22%2C%22title%22%3A%22R-R1196417%22%7D">
<div class="display-card-section_section__42C0n display-card-body_body__r2mt4 card-body_root__E16CU">
<div class="ratio-box_root__YH5Fe ratio-box_ratio-21-10__thBP0">
<div class="ratio-box_content__k-Jz7">
<img class="card-image-set_imageSet__Su7xI lazyautosizes ls-is-cached lazyloaded" alt="'Tiramisu' met advocaat" data-srcset=", https://static.ah.nl/static/recepten/img_RAM_PRD163172_220x162_JPG.jpg 220w 162h, >
</div>
</div>
</div>
<footer class="display-card-section_section__42C0n display-card-section_padded__lHvvK display-card-footer_footer__cxMve card-footer_root__0dl7R">
<ul class="recipe-card-properties_root__rFiwt recipe-card-properties_allerhande__0gSBC" data-testhook="recipe-card-properties">
<li class="recipe-card-properties_property__87cH1">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" class="allerhande-icon recipe-card-properties_icon__wBmG9 svg svg--svg_time" viewBox="0 0 24 24" width="24" height="16">
<use xlink:href="#svg_time">
</use>
</svg>
20 min
</li>
<li class="recipe-card-properties_property__87cH1">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" class="allerhande-icon recipe-card-properties_icon__wBmG9 svg svg--svg_calories" viewBox="0 0 24 24" width="24" height="16">
<use xlink:href="#svg_calories">
</use>
</svg>
545 kcal
</li>
<li class="recipe-card-properties_property__87cH1">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" class="allerhande-icon recipe-card-properties_icon__wBmG9 svg svg--svg_person" viewBox="0 0 24 24" width="24" height="16">
<use xlink:href="#svg_person">
</use>
</svg>
8</li>
</ul>
<p class="typography_root__Om3Wh typography_variant-paragraph__T5ZAU typography_hasMargin__4EaQi card-text_title__REC-7">
<span class="line-clamp_root__7DevG line-clamp_active__5Qc2L card-text_titleText__7T9sY card-text_boldTitle__SVYw2" data-testhook="recipe-card-title" style="-webkit-line-clamp: 2; line-height: 1.2em; max-height: 2.4em;">
'Tiramisu' met advocaat
</span>
</p>
</footer>
</a>
</div>
and here is the code I am using to substract the information I need:
#Create soup
webpage_response = requests.get("https://www.ah.nl/allerhande/recepten-zoeken?sortBy=TRENDING")
webpage = webpage_response.content
soup = BeautifulSoup(webpage, "html.parser")
recipe_links = soup.find_all('a', attrs={'class' : re.compile('^display-card_root__.*')})
recipe_pictures = soup.find_all('img', attrs={'class' : re.compile('^card-image-set_imageSet__.*')})
recipe_prep_time = soup.find_all('li', attrs={'class' : re.compile('^recipe-card-properties_property__.*')})
However: this selects all the li items, including calories etc, which creates an issue if I want to select the correct time from the list.How can I onlt select the first li?
Solution
You can do that using css selector as follows:
import requests
from datetime import datetime
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.ah.nl/allerhande/recepten-zoeken?sortBy=TRENDING')
soup = bs(r.content, "html.parser")
for li in soup.select('li[class="recipe-card-properties_property__87cH1"]:nth-child(1)'):
print(li.text)
Output:
15 min
15 min
20 min
20 min
35 min
20 min
20 min
10 min
45 min
50 min
15 min
10 min
15 min
25 min
30 min
25 min
15 min
25 min
20 min
15 min
25 min
20 min
25 min
10 min
15 min
15 min
40 min
15 min
15 min
15 min
25 min
55 min
25 min
15 min
7 min
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.