Issue
I am using Scrapy with Selenium to scrape content from this page: https://nikmikk.itch.io/door-knocker
In it, there is a table under the div with class .game_info_panel_widget
, where the first row Published 62 days ago
seems to be loaded dynamically.
I have try fetching the page as Scrapy sees but cannot find that row in the html.
scrapy fetch --nolog https://nikmikk.itch.io/door-knocker > test.html
Here is what I see in test.html
, the first table row is the Status, not the Published row like when I view page source directly in Chrome.
<div class="game_info_panel_widget">
<table>
<tbody>
<tr>
<td>Status</td>
<td>Prototype</td>
...
</tr>
...
In my class SpiderDownloaderMiddleware
, I have included Selenium:
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1200x600')
driver = webdriver.Chrome(chrome_options=options)
class SpiderDownloaderMiddleware(object):
# Omitted other codes
def process_request(self, request, spider):
driver.get(request.url)
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".game_info_panel_widget"))
)
body = driver.page_source
return HtmlResponse(driver.current_url, body=body, encoding='utf-8-sig', request=request)
How do I check how that row is loaded and how I can scrape those infos?
Updated: I followed @Yosuva A 's answer below and got something like this:
9 days ago
In development
Platforms
Windows
Rating
(9)
Author
David Clark
Genre
Survival, Puzzle
Tags
3D, Creepy, First-Person, Horror, Psychological Horror, Short, Singleplayer, Spooky, Unity
Average session
A few seconds
Languages
English
But the output is inconsistent, sometimes it gives the desired one, sometimes it doesn't. I guess because Selenium waits for the general td
element, which is common:
"//div[@class='game_info_panel_widget']//table//tr//td"
I have tried to modified to use td[@text='Published']
but Selenium gives timeout.
My code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('chromedriver') # Optional argument, if not specified will search path.
driver.implicitly_wait(15)
driver.get("https://thehive.itch.io/promnesia");
driver.find_element(By.XPATH,"//a[@class='toggle_info_btn']").click()
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@class='game_info_panel_widget']//table//tr//td"))) #Wait for specific element
table_rows= driver.find_elements(By.XPATH,"//div[@class='game_info_panel_widget']//table//tr//td")
for rows in table_rows:
print(rows.text)
driver.quit()
Any other way?
Conclusion:
It works if we time.sleep(2)
after click()
as suggested by Yosuva A.
Solution
Please let me know whether this help or not
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('/usr/local/bin/chromedriver') # Optional argument, if not specified will search path.
driver.implicitly_wait(15)
driver.get("https://thehive.itch.io/promnesia");
driver.find_element(By.XPATH,"//a[@class='toggle_info_btn']").click()
time.sleep(2)
WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.XPATH, "//div[@class='game_info_panel_widget']/table//tr//td"))) #Wait for specific element
table_rows= driver.find_elements(By.XPATH,"//div[@class='game_info_panel_widget']/table//tr//td")
for rows in table_rows:
print rows.text
driver.quit()
Output
Updated
1 day ago
Published
9 days ago
Status
In development
Platforms
Windows
Rating
(9)
Author
David Clark
Genre
Survival, Puzzle
Tags
3D, Creepy, First-Person, Horror, Psychological Horror, Short, Singleplayer, Spooky, Unity
Average session
A few seconds
Languages
English
Answered By - Yosuva Arulanthu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.