Issue
I have HTML content as given below:
content ="<p class="sub">
Sector:
<a href="/company/compare/00000008/">
Capital Goods - Electrical Equipment
</a>
<span style="margin: 16px"></span>
Industry:
<a href="/company/compare/00000008/00000039/">
Electric Equipment
</a>"
</p>
I want to parse sector = Capital Goods - Electrical Equipment
and Industry=Electric Equipment
using BeautifulSoup
. Kindly guide me for same.
Solution
To get the texts into a structured format like dict
with key / value pairs you can use a dict comprhension
:
dict([(x.previous.strip()[:-1],x.get_text(strip=True)) for x in soup.select('p.sub a')])
These is selecting all <a>
in your example, iterates the ResultSet
for the values and also extract the associated key.
{'Sector': 'Capital Goods - Electrical Equipment', 'Industry': 'Electric Equipment'}
Example
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
html ='''
<p class="sub">
Sector:
<a href="/company/compare/00000008/">
Capital Goods - Electrical Equipment
</a>
<span style="margin: 16px">
</span>
Industry:
<a href="/company/compare/00000008/00000039/">
Electric Equipment
</a>
"
</p>
'''
dict([(x.previous.strip()[:-1],x.get_text(strip=True)) for x in soup.select('p.sub a')])
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.