Issue
from bs4 import BeautifulSoup
import re
html_content = """<div class='ui very padded vertical segment'>
<div class='ui basic clearing segment' style='margin: 0; padding: 1em 0'>
<h4 class='ui header'>
Description
</h4>
<p>Please bring the failure blade to cabin.</p>
</div>
<div class='column'>
<h4 class='ui header'>
Owner Information
</h4>
<div class='ui list'>
<div class='item'>
<i class='grey user icon'></i>
<div class='content'>No Owner Specified</div>
</div>
</div>
</div>"""
work_order_soup = BeautifulSoup(html_content,"html.parser")
find_description = work_order_soup.find(re.compile("^h[1-6]$"), text=re.compile("Description", re.IGNORECASE))
parent_div_description = find_description.find_parent("div")
print(parent_div_description.text)
Without finding the p tag I need to get the text from the parent div. I need to actually get rid of Description from the text. I have already find the description using find_description. Required solution: Please bring the failure blade to cabin.
Solution
Remove the <h*>
tag from the parent and get the text:
import re
from bs4 import BeautifulSoup
html_content = """<div class='ui very padded vertical segment'>
<div class='ui basic clearing segment' style='margin: 0; padding: 1em 0'>
<h4 class='ui header'>
Description
</h4>
<p>Please bring the failure blade to cabin.</p>
</div>
<div class='column'>
<h4 class='ui header'>
Owner Information
</h4>
<div class='ui list'>
<div class='item'>
<i class='grey user icon'></i>
<div class='content'>No Owner Specified</div>
</div>
</div>
</div>"""
work_order_soup = BeautifulSoup(html_content, "html.parser")
find_description = work_order_soup.find(
re.compile("^h[1-6]$"), string=re.compile("Description", re.IGNORECASE)
)
parent = find_description.parent
find_description.extract()
print(parent.get_text(strip=True))
Prints:
Please bring the failure blade to cabin.
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.