Issue
I want to scrape a few URL that have 2 divs
using same class="description"
,
The source code of a sample URL is like this:
<!-- Initial HTML here -->
<div class="description">
<h4> Anonymous Title </h4>
<div class="product-description">
<li> Some stuff here </li>
</div>
</div>
<!-- Middle HTML here -->
<div class="description">
Some text here
</div>
<!-- Last HTML here -->
I'm scraping it using BeautifulSoap using following script
# imports etc here
description_box = soup.find('div', attrs={'class': 'description'})
description = description_box.text.strip()
print description
Running it gives me the first div
with class="description"
only however I want the second div
with class="description"
only.
Any ideas how I can ignore the first div
and just scrape the second?
P.S. First div
always have h4
tags and second div
only has plain text in between tags.
Solution
If you do .find_all
, it'll return all in a list. It's then just a matter of selecting the 2nd item in that list using index 1:
html = '''<!-- Initial HTML here -->
<div class="description">
<h4> Anonymous Title </h4>
<div class="product-description">
<li> Some stuff here </li>
</div>
</div>
<!-- Middle HTML here -->
<div class="description">
Some text here
</div>
<!-- Last HTML here -->'''
soup = BeautifulSoup(html, 'html.parser')
divs = soup.find_all('div', {'class':'description'})
div = divs[1]
Output:
print (div)
<div class="description">
Some text here
</div>
Answered By - chitown88
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.