Issue
I used to pull text after a specific span tag, based on this post. In the example, the code retrieved $167.00 from the old HTML, which is Price Target and what I want. But since the website format was changed to the New HTML below, the code returns nothing with the New HTML. The span tag of <span>Price Target (6-12 Months):</span>
is changed to <span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
Old HTML:
Apple Inc. (AAPL)
$148.19 (As of 08/20/21)
Price Target (6-12 Months): $167.00
New HTML:
Apple Inc. (AAPL)
$148.19 (As of 08/20/21)
Price Target (6-12 Months)(2): $167.00
What should be changed in the code below, in order to retrieve $167.00?
from bs4 import BeautifulSoup
import requests
import re
HTML = '''
<h1> Apple Inc. (AAPL) </h1>
<p>$148.19
<span>(As of 08/20/21)</span>
</p>
<p>
<span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
$167.00
</p>
'''
page = requests.get(HTML)
soup = BeautifulSoup(page.content, 'lxml')
value = soup.find('span', string=re.compile("Price Target")).parent.contents[1]
print(value)
Solution
You can search for <span>
tag containing the text "Price Target"
and then next text sibling:
from bs4 import BeautifulSoup
HTML = '''
<h1> Apple Inc. (AAPL) </h1>
<p>$148.19
<span>(As of 08/20/21)</span>
</p>
<p>
<span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
$167.00
</p>
'''
soup = BeautifulSoup(HTML, 'lxml')
value = soup.select_one('span:-soup-contains("Price Target")').find_next_sibling(string=True).strip()
print(value)
Prints:
$167.00
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.