Issue
I am trying to extract the text from this:
[<div class="menu__vendor-name" itemprop="name">Beno's Flowers & Gifts</div>, <div
class="menu__vendor-name" itemprop="name">Bluebird Diner</div>, <div
class="menu__vendor-name" itemprop="name">Bread Garden Market</div>]
This is my code:
import requests
from bs4 import BeautifulSoup
url = 'https://www.chomp.delivery/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
restaurant_wrapper = soup.find(class_ = "dd_rest_list")
restaurants = restaurant_wrapper.find_all(class_="menu__vendor-name",
itemprop="name")
def extract_restaurant_data(restaurant):
results = [
{
"title": print(title.text.strip())
}
for title in restaurant_details
]
print(results)
results = [extract_restaurant_data(restaurant) for restaurant in restaurants]
Output:
AttributeError: 'tuple' object has no attribute 'text'
I am thinking that the issue is that each div has an itemprop, maybe this is the issue.
Solution
Assuming your goal is to scrape some details from each restaurant and not only its name. Change your strategy - process the data in same way you will read it and store it more structured in a list
of dicts
:
results = []
for restaurant in soup.select('.dd_rest_list a'):
results.append({
'title':restaurant.find('div',{'itemprop':'name'}).text,
'url':'https://www.chomp.delivery'+restaurant.get('href'),
'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
'and':'so on'
})
Always check if element you like to select exists before calling a methode:
'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
Example
import requests
from bs4 import BeautifulSoup
url = 'https://www.chomp.delivery/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text)
results = []
for restaurant in soup.select('.dd_rest_list a'):
results.append({
'title':restaurant.find('div',{'itemprop':'name'}).text,
'url':'https://www.chomp.delivery'+restaurant.get('href'),
'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
'and':'so on'
})
results
Output
[{'title': '2 Dogs Pub',
'url': 'https://www.chomp.delivery/r/21/restaurants/delivery/Burgers/2-Dogs-Pub-Iowa-City',
'address': '1705 S 1st Ave Ste Q,Iowa City,IA,52240',
'and': 'so on'},
{'title': 'Alebrije Mexican Restaurant',
'url': 'https://www.chomp.delivery/r/3316/restaurants/delivery/Mexican/Alebrije-Mexican-Restaurant-Iowa-City',
'address': '401 S Linn st,Iowa City,IA,52240',
'and': 'so on'},
{'title': 'Ascended Electronics',
'url': 'https://www.chomp.delivery/r/2521/restaurants/delivery/Retail/Ascended-Electronics-Iowa-City',
'address': '208 Stevens Dr,Iowa City,IA,52240',
'and': 'so on'},
{'title': 'Aspen Leaf Frozen Yogurt',
'url': 'https://www.chomp.delivery/r/522/restaurants/delivery/Ice-Cream-Sweets-Snacks/Aspen-Leaf-Frozen-Yogurt-Iowa-City',
'address': '125 S Dubuque St,Iowa City,IA,52240',
'and': 'so on'},...]
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.