Issue
I was trying to scrape google maps. The phone and hours variable is not returning any data. Other variables work fine and return data. The XPATH is correct. I am not sure what's the issue here.
Here is the LINK
The other selectors like name, address, title, website return the data fine but phone and hours not returning any data.
Hoping for some answers.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from scrapy.selector import Selector
import csv
from tqdm import tqdm
import time
driver = webdriver.Firefox()
linksFile=open("links.txt",'r')
allLinks = linksFile.readlines()
for link in tqdm(allLinks):
try:
driver.get(link)
except Exception:
print('Something went wrong with the URL: ')
# time.sleep(15)
while True:
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.XPATH, '//div[contains(text(), "Directions")] | //div[contains(text(), "Website")]'))
)
results = driver.find_elements_by_xpath('//div[contains(text(), "Directions")] | //div[contains(text(), "Website")]')
for result in results:
# writing to the CSV file
outFile = open("data.csv",'a+',newline="")
writer = csv.writer(outFile)
business = driver.find_element_by_xpath('//div[@role="heading"]/div')
business.click()
# waiting for the page to load
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.XPATH, '//div[@class="immersive-container"]'))
)
# parcing response to the scrapy selector
response = Selector(text=driver.page_source)
name = response.xpath('//h2[@data-attrid="title"]/span/text()').get()
title = response.xpath('(//span[contains(text(), "Google reviews")])/parent::a/parent::span/parent::span/parent::div/parent::div/parent::div/following-sibling::div/div/span/span/text()').get()
address = response.xpath('//a[contains(text(), "Address")]/parent::span/following-sibling::span/text()').get()
website = response.xpath('(//a[contains(text(), "Website")])/@href').get()
phone = response.xpath('//a[contains(text(), "Phone")]/parent::span/following-sibling::span/a/span/text()').get()
hours = response.xpath('//a[contains(text(), "Hours")]/parent::span/following-sibling::div/label/span//btext()').get()
total_reviews = response.xpath('(//span[contains(text(), "Google reviews")])[1]/text()').get()
total_rating = response.xpath('(//span[contains(text(), "Google reviews")])/parent::a/parent::span/parent::span/parent::div/span/text()').get()
input('Check: ')
outFile = open("data.csv",'a+',newline="")
writer = csv.writer(outFile)
vals = [name, title, address, website, phone, hours, total_reviews, total_rating]
writer.writerow(vals)
outFile.close()
Solution
Can you use Java script outerHTML intead of pageSource.
response = Selector( driver.execute_script("return document.documentElement.outerHTML"))
Also there is an issue in xpath of Hours:
hours = response.xpath('//a[contains(text(), "Hours")]/parent::span/following-sibling::div/label/span//b/text()').get()
Answered By - rahul rai
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.