Issue
I need to scrape data from a webpage with selenium. I need to find these elements:
<div class="content-left">
<ul></ul>
<ul></ul>
<p></p>
<ul></ul>
<p></p>
<ul></ul>
<p></p>
<ul>
<li></li>
<li></li>
</ul>
<p></p>
</div>
As you can see <p>
and <ul>
tags has no classes and I don't know how to get them in order.
I used Beautifulsoup before:
allP = bs.find('div', attrs={"class":"content-left"})
txt = ""
for p in allP.find_all(['p', 'li']):
But It's not working anymore (got 403 error by requests). And I need to find these elements with selenium.
HTML:
Solution
To extract the texts from <p>
and <li>
tags only you can use Beautiful Soup as follows:
from bs4 import BeautifulSoup
html_text = '''
<div class="content-left">
<ul>1</ul>
<ul>2</ul>
<p>3</p>
<ul>4</ul>
<p>5</p>
<ul>6</ul>
<p>7</p>
<ul>
<li>8</li>
<li>9</li>
</ul>
<p>10</p>
</div>
'''
soup = BeautifulSoup(html_text, 'html.parser')
parent_element = soup.find("div", {"class": "content-left"})
for element in parent_element.find_all(['p', 'li']):
print(element.text)
Console output:
3
5
7
8
9
10
Using Selenium
Using Selenium you can use list comprehension as follows:
Using CSS_SELECTOR:
print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.content-left p, div.content-left li")])
Answered By - undetected Selenium
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.