Issue
I have a soup
with content like following many <div>
, those who I'm interested in are which have the class foo
In each <div>
, there are a lot of links and other content, I'm interested in the second link (second <a> </a>
) => it is always the second.
I want to grab the value of href
attribute and the text between the second link tag <a> </a>
for example :
<div class ="foo">
<a href ="http://example.com"> </a>
<a href ="http://example2.com"> Title here </a>
</div>
<div class ="foo">
<a href ="http://example3.com"> </a>
<a href ="http://example4.com"> Title 2 here </a>
</div>
here I want to get :
Title here => http://example2.com
Title 2 here => http://example4.com
I've tried writing some code :
soup.findAll("div", { "class" : "foo" })
but that returns a list with all divs and their content and I don't know how to go further
thanks :)
Solution
Iterate div
s and find a
there.
from bs4 import BeautifulSoup
example = '''
<div class ="foo">
<a href ="http://example.com"> </a>
<a href ="http://example2.com"> Title here </a>
</div>
<div class ="foo">
<a href ="http://example3.com"> </a>
<a href ="http://example4.com"> Title 2 here </a>
'''
soup = BeautifulSoup(example)
for div in soup.findAll('div', {'class': 'foo'}):
a = div.findAll('a')[1]
print a.text.strip(), '=>', a.attrs['href']
Answered By - falsetru
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.