Issue
I have tried below code for parser html using BeautifulSoup.
item_detail_soup = BeautifulSoup(html, "html.parser")
h1 = item_detail_soup.find("h1")
My H1 parser output is :
<h1>
<div class="brand" style="display: block; font-size: 0.75rem;">Apple(#34567)</div>
〔NEW〕 iPhone12 256GB </h1>
I'm trying to remove this div witch with class name brand
.
My desire output :
<h1> (NEW) iPhone12 256GB </h1>
I have tried by extract() then replace , But I have failed.
h1 = item_detail_soup.find("h1")
h1 = h1.replace(item_detail_soup.find("h1").div.extract(),'')
How can I get desire output ?
Solution
Good news, you were on the right track - To get your goal you can go with .extract()
, .replace_with()
and .decompose()
as well.
What is the difference extract vs decompose?
While .extract()
removes a tag or string from the tree and returns it / keeps it as additional parse tree, decompose()
removes the tag from the tree and destroys it and its contents completely.
What went wrong?
The reason why you will not get your expected result is, that you try to operate on your h1
variable so it will always be empty (.decompose()
) or will contain the extracted tag
(.extract()
).
How to fix?
First select the tag
you like to remove from the tree, remove it and than select your <h1>
to see the result.
Example (There is exactly one div with class brand in item_detail_soup
)
from bs4 import BeautifulSoup
html = '''<h1><div class="brand" style="display: block; font-size: 0.75rem;">Apple(#34567)</div>〔NEW〕 iPhone12 256GB </h1>'''
item_detail_soup = BeautifulSoup(html, 'html.parser')
item_detail_soup.select_one('div.brand').extract()
h1 = item_detail_soup.find('h1')
Example (There is more then one div with class brand in item_detail_soup
)
Be aware that you just can use one option a time, so I commented the others out.
from bs4 import BeautifulSoup
html = '''<h1><div class="brand" style="display: block; font-size: 0.75rem;">Apple(#34567)</div>〔NEW〕 iPhone12 256GB </h1>'''
item_detail_soup = BeautifulSoup(html, 'lxml')
for item in item_detail_soup.select('div.brand'):
item.extract()
#item.decompose()
#item.replace_with('')
item_detail_soup.h1
Output
<h1>〔NEW〕 iPhone12 256GB </h1>
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.