Issue
i try to scrape some information from a website - for most of the div-informations this works fine - but i have problem reading some specific DIVs. At first i only tried it with "normal" bs4-request - but then also with selenium - but i still get no data back...
Below you can find my full code. It works fine with a response with this search:
tmpDiv = soup.find ("div", {"id": "financial-strength"})
But it is not working with this div:
tmpDiv = soup.find ("div", {"id": "analyst-estimate"})
It outputs only
<div class="children" data-v-39722e0c="" id="analyst-estimate" style="min-
height:200px;display:block;">
</div>
Below you can find the full (not working) code
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import sys, os
from selenium.webdriver.chrome.options import Options
link = "https://www.gurufocus.com/stock/AAPL/summary"
path = os.path.abspath (os.path.dirname (sys.argv[0]))
options = Options ()
options.add_argument ('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
cd = '/chromedriver.exe'
driver = webdriver.Chrome (path + cd, options=options)
driver.get (link)
soup = BeautifulSoup (driver.page_source, 'html.parser')
time.sleep (2)
page = requests.get (link)
soup = BeautifulSoup (page.content, 'html.parser')
# tmpDiv = soup.find ("div", {"id": "financial-strength"})
tmpDiv = soup.find ("div", {"id": "analyst-estimate"})
print(tmpDiv.prettify())
I heard this is probably a "lazy loading website" - but shouldn´t the selenium-access wait till the full site is loaded with all the content?
Solution
What happens?
There are two major things, why you wont get the result:
After requesting website with
selenium
you also requesting it withrequests
and assign the response to soup.Data wont be loaded, if not needed, that is what you already figured out --> "lazy loading website"
How to fix that?
Remove all
requests
specific linesScroll the element you need into view, so that data is loading:
element = driver.find_element_by_id("analyst-estimate") driver.execute_script("arguments[0].scrollIntoView();", element)
Example
Be aware, I added my webdriver path, so you have to edit it.
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
link = "https://www.gurufocus.com/stock/AAPL/summary"
driver = webdriver.Chrome ('C:\Program Files\ChromeDriver\chromedriver.exe')
driver.get (link)
time.sleep(2.5)
element = driver.find_element_by_id("analyst-estimate")
driver.execute_script("arguments[0].scrollIntoView();", element)
time.sleep(1)
soup = BeautifulSoup (driver.page_source, 'html.parser')
# tmpDiv = soup.find ("div", {"id": "financial-strength"})
tmpDiv = soup.find ("div", {"id": "analyst-estimate"})
print(tmpDiv.prettify())
Output
<div class="children" data-v-39722e0c="" id="analyst-estimate" style="">
<div class="capture-area">
<h2 class="fs-large fc-primary fw-bolder">
Analyst Estimate
</h2>
<table class="normal-table-mobile financial-strength-table">
<tbody>
<tr>
<td>
</td>
<td>
Sep 2021
</td>
<td>
Sep 2022
</td>
<td>
Sep 2023
</td>
</tr>
<tr>
<td>
Revenue (Mil $)
</td>
<td>
<span>
313003.40
</span>
</td>
<td>
<span>
328872.10
</span>
</td>
<td>
<span>
341577.60
</span>
</td>
</tr>
<tr>
<td>
EBIT (Mil $)
</td>
<td>
<span>
76803.87
</span>
</td>
<td>
<span>
81038.89
</span>
</td>
<td>
<span>
84830.53
</span>
</td>
</tr>
<tr>
<td>
EBITDA (Mil $)
</td>
<td>
<span>
88706.60
</span>
</td>
<td>
<span>
92604.88
</span>
</td>
<td>
<span>
94034.53
</span>
</td>
</tr>
<tr>
<td>
EPS ($)
</td>
<td>
<span>
3.94
</span>
</td>
<td>
<span>
4.28
</span>
</td>
<td>
<span>
4.55
</span>
</td>
</tr>
<tr>
<td>
EPS without NRI ($)
</td>
<td>
<span>
3.97
</span>
</td>
<td>
<span>
4.27
</span>
</td>
<td>
<span>
4.55
</span>
</td>
</tr>
<tr>
<td>
EPS Growth Rate (%)
</td>
<td>
<span>
10.04
</span>
</td>
<td>
<!-- -->
</td>
<td>
<!-- -->
</td>
</tr>
<tr>
<td>
Dividends per Share ($)
</td>
<td>
<span>
0.74
</span>
</td>
<td>
<span>
0.82
</span>
</td>
<td>
<span>
1.15
</span>
</td>
</tr>
</tbody>
</table>
</div>
</div>
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.