Issue
I am very new to web scraping and am trying to get items under "Amenities and More" for one of my projects. As can be seen below I want to extract, Health Score Excellent, Offers delivery, Offers Takeout, etc." from a restaurant's Yelp page. I want to do this for several other restaurants yelp pages however for now I will settle on just figuring out this issue.
So far as I understood from different webpages I did following with no good result.
url='https://www.yelp.com/biz/ziggis-coffee-longmont'
yelp_page=requests.get(url)
yelp_soup=BeautifulSoup(yelp_page.content, 'lxml')
yelp_soup.find_all("span")
Result [<span class="offscreen" id="page-content"> </span>]
I am choosing 'span' as I see following when I click "inspect" over "Offers Takeout".
Other things I have tried are:
yelp_soup.find_all("span",{'class': "text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weigt--semibold__373c0__h2l0fe text-size--large__373c0__3t60B"})
And
yelp_soup.find_all("span",{'class': "text__373c0__2Kxyz"})
Result: []
Please suggest how to proceed. Thanks
Solution
I tried scraping this page on my end. It appears that lxml
does not grasp the spans that you are looking for. I changed lxml
to html.parser
(python built-in tool for parsing HTML) and the soup.find_all()
should work fine. Also, remember that when you scrape a class, you have to include the spaces that it in there too. (Note that there is a space at the beginning of the class in your screenshot.) Otherwise, BeautifulSoup will not be able to find the desired elements.
Here's the code that worked fine on my end:
from bs4 import BeautifulSoup
import requests
url='https://www.yelp.com/biz/ziggis-coffee-longmont'
yelp_page=requests.get(url)
print(yelp_page.status_code)
yelp_soup=BeautifulSoup(yelp_page.content, 'html.parser')
spans = yelp_soup.find_all("span")
print(yelp_soup.find_all('span', class_=" text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B"))
Response:
[<span class=" text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B">Drive-thru</span>, <span class=" text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B">Delivery</span>, <span class=" text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B">Takeout</span>, <span class=" text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B"><a class=" link__373c0__1G70M link-color--blue-dark__373c0__85-Nu link-size--inherit__373c0__1VFlE" href="/inspections/ziggis-coffee-longmont" name="" rel="" target="">Health Score</a></span>, <span class=" text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B">Offers Delivery</span>, <span class=" text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B">Offers Takeout</span>, <span class=" text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B">Accepts Credit Cards</span>, <span class=" text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B"><a class=" link__373c0__1G70M link-color--blue-dark__373c0__85-Nu link-size--inherit__373c0__1VFlE" href="/questions/Loqh-rc9CJiQEfS_EUE6ow/InzxYWdgCbaIpytQEjqxGQ" name="" rel="" role="link" target="">Answer this question</a></span>]
EDIT: If the above code didn't work, try
print(yelp_soup.find_all('span', {"class": " text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B"}))
Answered By - Parzival
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.