Issue
This question must be a duplicate, but for the sake of it, I can't find it anywhere.
html = """
<html>
<head>
</head>
<body>
<div id="7471292"></div>
<div id="5235252"></div>
<div href="/some/link/"></div>
<div id="7567327"></div>
<div id="1231312"></div>
<div class="card d-inline-block iteml_card elems3 section1 featured0 wished0"</div>
<div id="2342424"></div>
</body>
</html>
"""
#Create soup from html
soup = BeautifulSoup(html)
I want the following output:
[<div id="7471292"></div>,
<div id="5235252"></div>,
<div id="7567327"></div>,
<div id="1231312"></div>,
<div id="2342424"></div>]
We can do something like:
soup.find_all("div")
but this will return all div
s. If we want to specify an id
attractor, we have to fill in a concise value as well, seemingly rendering it useless:
soup.find_all('div', {'id': ""})
Solution
What happens?
You are close to your goal - But soup.find_all('div', {'id': ""})
would be interpreted as an empty or non-existent attribute id
, that is why you wont get your expected ResultSet
.
How to fix?
It is not much to do and it do not really need a regex
in your case, just use the keyword arguments and set your attribute to be True
:
soup.find_all('div', id=True)
with dict syntax:
soup.find_all('div', {'id':True})
Or the equivalent css selector
:
soup.select('div[id]')
Example
html = """
<html>
<head>
</head>
<body>
<div id="7471292"></div>
<div id="5235252"></div>
<div href="/some/link/"></div>
<div id="7567327"></div>
<div id="1231312"></div>
<div class="card d-inline-block iteml_card elems3 section1 featured0 wished0"</div>
<div id="2342424"></div>
</body>
</html>
"""
#Create soup from html
soup = BeautifulSoup(html)
soup.find_all('div', {'id':True})
Output
[<div id="7471292"></div>,
<div id="5235252"></div>,
<div id="7567327"></div>,
<div id="1231312"></div>,
<div id="2342424"></div>]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.