Issue
Im trying to scrape this page with Beautifulsoup.
https://www.nb.co.za/en/view-book/?id=9780798182539
How do I target specific elements if they don't have unique class or id?
Is it possible to scrape a div
based on the value/text in the sibling div
?
For instance, in the code below, how can I get 9780798182539
if the sibling div is <p>ISBN:</p>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>ISBN:</p>
</div>
<div class="col-md-9 noPadding">
9780798182539
</div>
</div>
Here is the complete html:
<div class="col-lg-7 col-md-12 col-sm-12 author-details">
<h2>Step by Step: Counting to 50 </h2>
<h5>
<a href="/en/authors?authorId=2163">Cuberdon</a>
</h5>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>ISBN:</p>
</div>
<div class="col-md-9 noPadding">
9780798182539
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Publisher:</p>
</div>
<div class="col-md-9 noPadding">
Human & Rousseau
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Date Released:</p>
</div>
<div class="col-md-9 noPadding">
November 2021
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Price (incl. VAT):</p>
</div>
<div class="col-md-9 noPadding">
R 120.00
</div>
</div>
<div class="row clearfix">
<div class="col-md-3 noPadding">
<p>Format:</p>
</div>
<div class="col-md-9 noPadding">
<p>Hard cover, 32pp</p>
</div>
</div>
</div>
Solution
You can use :-soup-contains
to target the p
tag by its text. Wrap around the :has
pseudo-class selector, and specify the relationship as direct parent child with a child >
combinator, to get the immediate parent div
. Then throw in an adjacent sibling combinator +
, with div
type selector, to move to the adjacent, div
:
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('http://www.nb.co.za/nb/view-book?id=9780798182539')
soup = bs(r.content, 'lxml')
print(soup.select_one('div:has(> p:-soup-contains("ISBN:")) + div' ).text.strip())
Answered By - QHarr
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.