Issue
my bs4.element.ResultSet has this format:
[<h3 class="foo1">
<a href="someLink" title="someTitle">SomeTitle</a>
</h3>,
<h3 class="foo1">
<a href="OtherLink" title="OtherTitle">OtherTitle</a>
</h3>]
and i want to be able to extract and save in tuple [(title,href),(title2, href2)] but i cant seem to do so
my closest attempt was
link = soup.find('h3',class_='foo1').find('a').get('title')
print(link)
but that only returns the first element of the 2 or more how can i successfully extract each href and title
Solution
Select your elements more specific e.g. with css selectors
and iterate over your ResultSet
to get the attributes of each of them as list of tuples
:
[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href][title]')]
Example
from bs4 import BeautifulSoup
html = '''
<h3 class="foo1">
<a href="someLink" title="someTitle">SomeTitle</a>
</h3>
<h3 class="foo1">
<a href="OtherLink" title="OtherTitle">OtherTitle</a>
</h3>
'''
soup = BeautifulSoup(html)
[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href]')]
Output
[('someTitle', 'someLink'), ('OtherTitle', 'OtherLink')]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.