Wednesday, August 10, 2022

[FIXED] Getting '0' output instead of 51 cities by using BeautifulSoup

August 10, 2022 beautifulsoup, pandas, python No comments

Issue

I was trying to Fetch all Uber Eats cities by using BeautifulSoup. I have tried with following codes and there was no error with running the code, But I have received '0' output instead of 51 cities. Can anyone please tell me how can I get expected output?

#Fetch all Uber Eats cities
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
import re
  
URL = "https://www.ubereats.com/location"

req = Request(URL, headers={'User-Agent': 'Chrome/103.0.5060.114', 'Accept': 'text/html,application/xhtml+xml,application/xml'})
webpage = urlopen(req).read()
# print(webpage)
soup = BeautifulSoup(webpage, 'html.parser') 

import pandas as pd
df = pd.DataFrame(columns=['city', 'state', 'url'])

#Fetching URL of all cities
states = soup.find_all("div", class_="av ax ay")
states = [state for state in states if state.text != "All Countries"]
len(states)

i = 0
for state in states:
    print(f"Processing: {state.text}")
    cities = state.parent.parent.findAll("a")
    for city in cities:
        df.loc[i] = [
            city.text,
            state.text,
            "https://www.ubereats.com" + city['href']
        ]
        i = i+1

Solution

When I open page in browser then I can't find div with class av ax ay. This page can use random names for classes and for every execution it can use different names. You should use different method to recognize div with city names.

For example for states

states = soup.select("h2 a")

and for cities

cities = state.parent.parent.parent.select("a span")

Minimal working code:

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
  
URL = "https://www.ubereats.com/location"

req = Request(URL, headers={'User-Agent': 'Chrome/103.0.5060.114', 'Accept': 'text/html,application/xhtml+xml,application/xml'})
webpage = urlopen(req).read()
# print(webpage)
soup = BeautifulSoup(webpage, 'html.parser') 

#Fetching URL of all cities
states = soup.select("h2 a")
states = [state for state in states if state.text != "All Countries"]
print('states:', len(states))

rows = []
for state in states:
    print("Processing:", state.text)
    cities = state.parent.parent.parent.select("a span")
    print('cities:', len(cities))
    for city in cities:
        rows.append([
            city.text,
            state.text,
            "https://www.ubereats.com" + city.parent['href']
        ])
        
# ---

import pandas as pd

df = pd.DataFrame(rows, columns=['city', 'state', 'url'])

print(df)

Result:

                city    state                                              url
0         Adamsville  Alabama      https://www.ubereats.com/city/adamsville-al
1          Alabaster  Alabama       https://www.ubereats.com/city/alabaster-al
2        Albertville  Alabama     https://www.ubereats.com/city/albertville-al
3     Alexander City  Alabama  https://www.ubereats.com/city/alexander-city-al
4         Alexandria  Alabama      https://www.ubereats.com/city/alexandria-al
...              ...      ...                                              ...
8664        Saratoga  Wyoming        https://www.ubereats.com/city/saratoga-wy
8665        Sheridan  Wyoming        https://www.ubereats.com/city/sheridan-wy
8666      Warren AFB  Wyoming      https://www.ubereats.com/city/warren-afb-wy
8667       Wheatland  Wyoming       https://www.ubereats.com/city/wheatland-wy
8668          Wilson  Wyoming          https://www.ubereats.com/city/wilson-wy

[8669 rows x 3 columns]

Answered By - furas

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, August 10, 2022

[FIXED] Getting '0' output instead of 51 cities by using BeautifulSoup

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels