Issue
I'm using bs4 to scrape a document which has a format like this, and only want all the a tag elements above text2. How can I do so?
<h1>text1</h1>
<a href="link">link</a>
<h1>text2</h1>
<a href="link"></a>
If I turn soup into string and split, not sure I can turn it back to soup and I need to use the soup.find_all('a')
afterwards.
Solution
try with soup.find_all_previous()
from bs4 import BeautifulSoup
soup = BeautifulSoup("""
<h1>text1</h1>
<a href="link">link</a>
<h1>text2</h1>
<a href="link"></a>""", "html.parser")
print(soup.find("h1", text="text2").find_all_previous())
[<a href="link">link</a>, <h1>text1</h1>]
Answered By - sushanth
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.