Issue
Do I need to use regex here?
The content I want looks like:
<meta content="text I want to grab" name="description"/>
However, there are many objects that start with "meta content=" I want the one that ends in name="description". I'm pretty new at regex, but I thought BS would be able to handle this.
Solution
Assuming you were able read the HTML contents into a variable and named the variable html
, you have to parse the HTML using beautifulsoup:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
Then, to search for <meta content="text I want to grab" name="description"/>
, you have to find a tag with name 'meta'
and attribute name='description'
:
def is_meta_description(tag):
return tag.name == 'meta' and tag['name'] == 'description'
meta_tag = soup.find(is_meta_description)
You are trying to fetch the content
attribute of the tag, so:
content = meta_tag['content']
Since it is a simple search, there is also a simpler way to find the tag:
meta_tag = soup.find('meta', attrs={'name': 'description'})
Answered By - zvone
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.