Issue
I want to find all tags that have attribute values equal to "ATTR1" and "ATTR2" without knowing the corresponding attribute names.
Let's assume I have the following:
page_content = '''<a href="ATTR1">text1</a>
<div class="random_value" type="ATTR2">text2</div>
<script class="ATTR1" id="ATTR2">text3</script>
<span class="random_value" id="ATTR2">text5</span>'''
I would like to have a script that retrieves only the third element, which has an attribute equal to "ATTR1" AND an attribute equal to "ATTR2". That is, I need the following:
<script class="ATTR1" id="ATTR2">text3</script>
I know I can pass a function as an argument to find_all()
. But, I need help understanding how I can write a function that returns true if these conditions are met.
Solution
Knowing the attribute names, simply chain your conditions e.g. with css selector
:
select('#ATTR2.ATTR1')
Or without knowing the attributes and just checking all values against:
for e in soup():
attr_list = [v for i in list(e.attrs.values()) for v in (i if isinstance(i,list) else [i])]
if all(x in attr_list for x in ['ATTR1','ATTR2']):
print(e)
Example
from bs4 import BeautifulSoup
html = '''
<a href="ATTR1">text1</a>
<div class="random_value" type="ATTR2">text2</div>
<script class="ATTR1" id="ATTR2">text3</script>
<span class="random_value" id="ATTR2">text5</div>'''
soup = BeautifulSoup(html)
print(soup.select('#ATTR2.ATTR1'))
for e in soup():
attr_list = [v for i in list(e.attrs.values()) for v in (i if isinstance(i,list) else [i])]
if all(x in attr_list for x in ['ATTR1','ATTR2']):
print(e)
Output
[<script class="ATTR1" id="ATTR2">text3</script>]
[<script class="ATTR1" id="ATTR2">text3</script>]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.