Issue
i tried to scrape the data from the website but it came back as nothing. please help me figure out how
from bs4 import BeautifulSoup
import requests
url = 'https://subway.com.my/find-a-subway'
page_to_scrape = requests.get(url)
soup = BeautifulSoup(page_to_scrape.text, 'html.parser')
loc = soup.find('div', id='fp_locationlist')
for outlet in loc.find('div', class_='fp_ll_holder'):
name = outlet.find('h4').text
print(name)
Solution
The data you see on the page is inside <script>
element so beautifulsoup doesn't see it. To parse it you can use re
/json
modules, e.g.:
import json
import re
import requests
from bs4 import BeautifulSoup
url = "https://subway.com.my/find-a-subway"
html_text = requests.get(url).text
data = re.search(r'"markerData":(\[.*?\}\]),', html_text).group(1)
data = json.loads(data)
# print names of shops:
for d in data:
soup = BeautifulSoup(d["infoBox"]["content"], "html.parser")
print(soup.h4.text)
Prints:
...
Subway Aeon Cheras Selatan
Subway Bandar Baru Jelawat
Subway Lumut Indah
Subway Cahaya Kota Puteri
Subway Kuala Krai
Taman Nusa Bestari
Subway Kuala Berang
Subway Sg Udang
Subway Kiara Bay
Subway Mahkota Square Kuantan
Subway Putra Ampang
Subway Sunggala Gateway
Subway Sunway Carnival
Subway Bangi Seksyen 7
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.