Issue
I was trying to Fetch all Uber Eats cities by using BeautifulSoup. I have tried with following codes and there was no error with running the code, But I have received '0' output instead of 51 cities. Can anyone please tell me how can I get expected output?
#Fetch all Uber Eats cities
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
import re
URL = "https://www.ubereats.com/location"
req = Request(URL, headers={'User-Agent': 'Chrome/103.0.5060.114', 'Accept': 'text/html,application/xhtml+xml,application/xml'})
webpage = urlopen(req).read()
# print(webpage)
soup = BeautifulSoup(webpage, 'html.parser')
import pandas as pd
df = pd.DataFrame(columns=['city', 'state', 'url'])
#Fetching URL of all cities
states = soup.find_all("div", class_="av ax ay")
states = [state for state in states if state.text != "All Countries"]
len(states)
i = 0
for state in states:
print(f"Processing: {state.text}")
cities = state.parent.parent.findAll("a")
for city in cities:
df.loc[i] = [
city.text,
state.text,
"https://www.ubereats.com" + city['href']
]
i = i+1
Solution
When I open page in browser then I can't find div
with class av ax ay
. This page can use random names for classes and for every execution it can use different names. You should use different method to recognize div with city names.
For example for states
states = soup.select("h2 a")
and for cities
cities = state.parent.parent.parent.select("a span")
Minimal working code:
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
URL = "https://www.ubereats.com/location"
req = Request(URL, headers={'User-Agent': 'Chrome/103.0.5060.114', 'Accept': 'text/html,application/xhtml+xml,application/xml'})
webpage = urlopen(req).read()
# print(webpage)
soup = BeautifulSoup(webpage, 'html.parser')
#Fetching URL of all cities
states = soup.select("h2 a")
states = [state for state in states if state.text != "All Countries"]
print('states:', len(states))
rows = []
for state in states:
print("Processing:", state.text)
cities = state.parent.parent.parent.select("a span")
print('cities:', len(cities))
for city in cities:
rows.append([
city.text,
state.text,
"https://www.ubereats.com" + city.parent['href']
])
# ---
import pandas as pd
df = pd.DataFrame(rows, columns=['city', 'state', 'url'])
print(df)
Result:
city state url
0 Adamsville Alabama https://www.ubereats.com/city/adamsville-al
1 Alabaster Alabama https://www.ubereats.com/city/alabaster-al
2 Albertville Alabama https://www.ubereats.com/city/albertville-al
3 Alexander City Alabama https://www.ubereats.com/city/alexander-city-al
4 Alexandria Alabama https://www.ubereats.com/city/alexandria-al
... ... ... ...
8664 Saratoga Wyoming https://www.ubereats.com/city/saratoga-wy
8665 Sheridan Wyoming https://www.ubereats.com/city/sheridan-wy
8666 Warren AFB Wyoming https://www.ubereats.com/city/warren-afb-wy
8667 Wheatland Wyoming https://www.ubereats.com/city/wheatland-wy
8668 Wilson Wyoming https://www.ubereats.com/city/wilson-wy
[8669 rows x 3 columns]
Answered By - furas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.