Monday, November 22, 2021

[FIXED] unable to find the correct CSS selector for scraping title

November 22, 2021 beautifulsoup, css-selectors, python, selenium, web-scraping No comments

Issue

hii folks I am try to get title of Zomato restaurants but I am confused why my CSS selector is not working which is in the last block of for loop ,and I have commented xpath also please help me https://www.zomato.com/pune/order-food-online?delivery_subzone=1165

import time
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from urllib.parse import urljoin

##### Web scrapper for infinite scrolling page #####
driver = webdriver.Chrome(executable_path='./chromedriver.exe')

driver.get("https://www.zomato.com/pune/delivery-in-budhwar-peth")

time.sleep(10)  # Allow 2 seconds for the web page to open
driver.find_element_by_xpath("//div[contains(text(),'Rating: 4.0+')]").click()
scroll_pause_time = 1 # You can set your own pause time. My laptop is a bit slow so I use 1 sec
screen_height = driver.execute_script("return window.screen.height;")   # get the screen height of the web
i = 1
count=0

while True:
    # scroll one screen height each time
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
    i += 1
    time.sleep(scroll_pause_time)
    # update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
    scroll_height = driver.execute_script("return document.body.scrollHeight;")
    # Break the loop when the height we need to scroll to is larger than the total scroll height
    if (screen_height) * i > scroll_height:
        break

soup = BeautifulSoup(driver.page_source, "html.parser")

for i in soup.select('a.sc-jHZirH intUsQ>div>p'):
    print(i.text)
#print('Count of all rests is',count)
#"//a[@class='sc-jHZirH intUsQ']/div/p"

Solution

Your path is close, however, you need to replace the space between sc-jHZirH and intUs with ., since both are class names:

for i in soup.select('a.sc-jHZirH.intUsQ>div>p'):
   print(i.text)

Answered By - Ajax1234

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, November 22, 2021

[FIXED] unable to find the correct CSS selector for scraping title

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels