Thursday, April 21, 2022

[FIXED] get text from a div with class and itemprop

April 21, 2022 beautifulsoup, python, web-scraping No comments

Issue

I am trying to extract the text from this:

[<div class="menu__vendor-name" itemprop="name">Beno's Flowers &amp; Gifts</div>, <div 
class="menu__vendor-name" itemprop="name">Bluebird Diner</div>, <div 
class="menu__vendor-name" itemprop="name">Bread Garden Market</div>]

This is my code:

import requests
from bs4 import BeautifulSoup

 url = 'https://www.chomp.delivery/restaurants'

 headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
       'AppleWebKit/537.36 (KHTML, like Gecko) '\
       'Chrome/75.0.3770.80 Safari/537.36'}

 response = requests.get(url,headers=headers)
 soup = BeautifulSoup(response.text, "html.parser")

 restaurant_wrapper = soup.find(class_ = "dd_rest_list")
 restaurants = restaurant_wrapper.find_all(class_="menu__vendor-name", 
 itemprop="name")
 
 def extract_restaurant_data(restaurant):
   results = [
    {
        "title": print(title.text.strip())
    }
    for title in restaurant_details
    ]

  print(results)

  results = [extract_restaurant_data(restaurant) for restaurant in restaurants]

Output:

 AttributeError: 'tuple' object has no attribute 'text'

I am thinking that the issue is that each div has an itemprop, maybe this is the issue.

Solution

Assuming your goal is to scrape some details from each restaurant and not only its name. Change your strategy - process the data in same way you will read it and store it more structured in a list of dicts:

results = []

for restaurant in soup.select('.dd_rest_list a'):
    results.append({
        'title':restaurant.find('div',{'itemprop':'name'}).text,
        'url':'https://www.chomp.delivery'+restaurant.get('href'),
        'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
        'and':'so on'
    })

Always check if element you like to select exists before calling a methode:

'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,

Example

import requests
from bs4 import BeautifulSoup

url = 'https://www.chomp.delivery/restaurants'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
   'AppleWebKit/537.36 (KHTML, like Gecko) '\
   'Chrome/75.0.3770.80 Safari/537.36'}

response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text)

results = []

for restaurant in soup.select('.dd_rest_list a'):
    results.append({
        'title':restaurant.find('div',{'itemprop':'name'}).text,
        'url':'https://www.chomp.delivery'+restaurant.get('href'),
        'address':restaurant.find('div',{'itemprop':'address'}).get_text(',',strip=True) if restaurant.find('div',{'itemprop':'address'}) else None,
        'and':'so on'
    })
results

Output

[{'title': '2 Dogs Pub',
  'url': 'https://www.chomp.delivery/r/21/restaurants/delivery/Burgers/2-Dogs-Pub-Iowa-City',
  'address': '1705 S 1st Ave Ste Q,Iowa City,IA,52240',
  'and': 'so on'},
 {'title': 'Alebrije Mexican Restaurant',
  'url': 'https://www.chomp.delivery/r/3316/restaurants/delivery/Mexican/Alebrije-Mexican-Restaurant-Iowa-City',
  'address': '401 S Linn st,Iowa City,IA,52240',
  'and': 'so on'},
 {'title': 'Ascended Electronics',
  'url': 'https://www.chomp.delivery/r/2521/restaurants/delivery/Retail/Ascended-Electronics-Iowa-City',
  'address': '208 Stevens Dr,Iowa City,IA,52240',
  'and': 'so on'},
 {'title': 'Aspen Leaf Frozen Yogurt',
  'url': 'https://www.chomp.delivery/r/522/restaurants/delivery/Ice-Cream-Sweets-Snacks/Aspen-Leaf-Frozen-Yogurt-Iowa-City',
  'address': '125 S Dubuque St,Iowa City,IA,52240',
  'and': 'so on'},...]

Answered By - HedgeHog

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, April 21, 2022

[FIXED] get text from a div with class and itemprop

Issue

Solution

Example

Output

0 comments:

Post a Comment

Popular Posts

Labels