Issue
I m using this code for scraping some data from the link https://website.grader.com/results/www.dubizzle.com
The code is as below
#!/usr/bin/python
import urllib
from bs4 import BeautifulSoup
from dateutil.parser import parse
from datetime import timedelta
import MySQLdb
import re
import pdb
import sys
import string
def getting_urls_of_all_pages():
url_rent_flat='https://website.grader.com/results/dubizzle.com'
every_property_in_a_page_data_extraction(url_rent_flat)
def every_property_in_a_page_data_extraction(url):
htmlfile=urllib.urlopen(url).read()
soup=BeautifulSoup(htmlfile)
print soup
Sizeofweb=""
try:
Sizeofweb= soup.find('span', {'data-reactid': ".0.0.3.0.0.3.$0.1.1.0"}).text
print Sizeofweb.get_text().encode("utf-8")
except StandardError as e:
error="Error was {0}".format(e)
print error
getting_urls_of_all_pages()
The part of the html which I am extracting is as below
Snap: https://www.dropbox.com/s/7dwbaiyizwa36m6/5.PNG?dl=0
Code:
<div class="result-value" data-reactid=".0.0.3.0.0.3.$0.1.1">
<span data-reactid=".0.0.3.0.0.3.$0.1.1.0">1.1</span>
<span class="result-value-unit" data-reactid=".0.0.3.0.0.3.$0.1.1.1">MB</span>
</div>
Problem: Problem is that the website takes around 10-15 seconds to load the html source file which has the tags which I want to extract as mentioned in the code.
When the code uses the line htmlfile=urllib.urlopen(url).read()
to load the html of the page, I think it loads html of preload of the link which is there before 10-15 seconds.
How can I make a pause in the code and let it load the data after 15+ seconds so the right html with the tags I want to extract loads in the program?
Solution
Using Selenium WebDriver will solve your problem. Specifically, it has a way to wait from specified number of seconds to process further. Something like the following should work.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox()
driver.get(baseurl)
try:
wait = WebDriverWait(driver, 60)
element = wait.until(
ec.element_to_be_clickable(...)
)
finally:
driver.quit()
Answered By - user6399774
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.