Issue
I am trying to figure out how to extract the only the title from the following article HTML attribute:
<a href="/wiki/Jammu_and_Kashmir_(union_territory)" title="Jammu and Kashmir (union territory)">Jammu and Kashmir< /a>
Currently I am able to extract all the whole article tag using:
print(soup.find_all('a'))
But how do I only extract the title in the attribute?
Solution
To access the title, use:
print(soup.find('a')['title'])
This will work only if there's one a
on the page. Otherwise, find the tag by it's text:
print(soup.find(lambda t: t.name == 'a' and 'Jammu and Kashmir' in t.text)['title'])
Edit
To get all titles:
for tag in soup.find_all(lambda t: t.name == 'a' and 'title' in t.attrs):
print(tag['title'])
Answered By - MendelG
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.