Issue
I am scraping grubhub and I am not able to scrape the full menu.
https://www.grubhub.com/restaurant/buca-di-beppo-1875-s-bascom-ave-campbell/335944
For example in the above,it only scrapes appitizers. Scrolling is required to get the rest, however the captcha realizes it is automated (with selenium) and I cannot scrape anymore.
Here is what I have:
driver.get(link)
time.sleep(2)
page = driver.page_source
soup = BeautifulSoup(page, 'html.parser')
dishes = soup.find_all('div', class_='menuItemNew-name')
descs = soup.find_all('div', class_='padding-y-2')
dishes_ = []
descs_ = []
for items in dishes:
dishes_ += items.find_all(text=True)
for items in descs:
descs_ += items.find_all(text=True)
print(dishes_)
print(descs_)
descs
are the descirptions of each dish which I also want to scrape.
How do I get the full menu (and the google maps link at the very bottom of the page if possible?)
Solution
To scrape the full menu the google maps link at the very bottom of the page you need to induce WebDriverWait for the visibility_of_element_located() and you can use the following locator strategy:
Code Block:
options = Options() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('excludeSwitches', ['enable-logging']) options.add_experimental_option('useAutomationExtension', False) options.add_argument('--disable-blink-features=AutomationControlled') s = Service('C:\\BrowserDrivers\\chromedriver.exe') driver = webdriver.Chrome(service=s, options=options) driver.get('https://www.grubhub.com/restaurant/buca-di-beppo-1875-s-bascom-ave-campbell/335944') WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a[data-testid='restaurant-about-google-map-link']"))) print(driver.page_source)
Console Output:
<html lang="en" class=" async-hide"><head><script type="text/javascript" async="" charset="utf-8" id="utag_367" src="//d.impactradius-event.com/A1231534-f0ec-4c6c-b14f-75a55231a9591.js"></script><script src="https://ext.chtbl.com/trackable.js"></script><script type="text/javascript" async="" charset="utf-8" src="https://www.googletagmanager.com/gtag/js?id=G-7YX8989VK2" id="utag_628"></script><script type="text/javascript" async="" src="https://www.googletagmanager.com/gtag/destination?id=G-7YX8989VK2&l=dataLayer&cx=c"></script><script type="text/javascript" async="" src="https://www.googletagmanager.com/gtag/js?id=G-7YX8989VK2&l=dataLayer&cx=c"></script><script type="text/javascript" async="" src="https://www.googletagmanager.com/gtag/destination?id=DC-11687855&l=dataLayer&cx=c"></script><script type="text/javascript" async="" src="https://www.googletagmanager.com/gtag/js?id=DC-11687855&l=dataLayer&cx=c"></script><script type="text/javascript" async="" charset="utf-8" id="utag_577" src="//js.adsrvr.org/up_loader.1.1.0.js"></script><script type="text/javascript" async="" charset="utf-8" src="//analytics.tiktok.com/i18n/pixel/events.js?sdkid=undefinedttq" id="utag_568"></script><script type="text/javascript" async="" charset="utf-8" id="utag_550" src="//mi.grubhub.com/p/js/1.js"></script><script src="https://www.redditstatic.com/ads/pixel.js" async=""></script><script type="text/javascript" async="" charset="utf-8" src="https://pixel.mathtag.com/event/js?version=1.1&delimiter=%2C&industry=Internet%20Services&event_type=catchall&mt_id=1427886&mt_pp=1&mt_adid=227305" id="utag_430"></script><script async="" src="//px.airpr.com/airpr.js"></script><script type="text/javascript" async="" src="https://www.googletagmanager.com/gtag/js?id=AW-987205382&l=dataLayer&cx=c"></script><script async="" src="https://sc-static.net/scevent.min.js"></script><script type="text/javascript" async="" charset="utf-8" id="utag_566" src="https://connect.facebook.net/en_US/fbevents.js"></script><script type="text/javascript" defer="" async="" src="https://collector-21091.us.tvsquared.com/tv2track.js"></script><script type="text/javascript" async="" charset="utf-8" src="//bat.bing.com/bat.js" id="utag_171"></script><script type="text/javascript" async="" charset="utf-8" src="https://www.google-analytics.com/analytics.js" id="tealium-tag-7110"></script><script type="text/javascript" async="" src="https://www.google-analytics.com/plugins/ua/linkid.js"></script><script type="text/javascript" src="https://bam-cell.nr-data.net/1/5923691cbd?a=11156950&sa=1&v=1216.487a282&t=Unnamed%20Transaction&ct=https://www.grubhub.com/restaurant&rst=2434&ck=1&ref=https://www.grubhub.com/restaurant/buca-di-beppo-1875-s-bascom-ave-campbell/335944&be=541&fe=2213&dc=986&af=err,xhr,stn,ins,spa&perf=%7B%22timing%22:%7B%22of%22:1661037628304,%22n%22:0,%22f%22:1,%22dn%22:2,%22dne%22:71,%22c%22:71,%22s%22:100,%22ce%22:166,%22rq%22:166,%22rp%22:479,%22rpe%22:560,%22dl%22:485,%22di%22:987,%22ds%22:987,%22de%22:987,%22dc%22:2213,%22l%22:2213,%22le%22:2218%7D,%22navigation%22:%7B%7D%7D&fp=826&fcp=1572&ja=%7B%22diner_type%22:%22diner_unknown%22,%22umami_app_version%22:%224.2.3852%22,%22ab_testing_status%22:%22optimize%20enabled%22,%22clickstream_browser_id%22:%22dec60c6c-11f2-4a3f-9f08-18cc784d5682%22,%22ad_block_enabled%22:true,%22is_spider_bot%22:false,%22clickstream_session_id%22:%22ae778399-20de-11ed-a9d5-23c0dcc7cb7b%22,%22first-paint%22:826.5,%22first-contentful-paint%22:1572.7999999523163,%22fetchStart%22:1,%22domainLookupStart%22:2,%22domainLookupEnd%22:71,%22connectStart%22:71,%22connectEnd%22:166,%22secureConnectionStart%22:100,%22requestStart%22:166,%22responseStart%22:479,%22responseEnd%22:560,%22domLoading%22:485,%22domInteractive%22:987,%22domContentLoadedEventStart%22:987,%22domContentLoadedEventEnd%22:987,%22domComplete%22:2213,%22loadEventStart%22:2213%7D&jsonp=NREUM.setToken"></script><script src="https://js-agent.newrelic.com/nr-spa-1216.min.js"></script><script type="text/javascript" async="" src="https://www.google-analytics.com/gtm/js?id=GTM-58CKX3J&t=teal_grubhublabs_UniversalproductionStandard&cid=1361115206.1661037630"></script><script src="https://cdn.ravenjs.com/3.26.4/raven.min.js"></script><script src="https://assets.grubhub.com/assets/dll/load-uuid-740f2944b2a1abda6733.js"></script> <link rel="manifest" href="https://assets.grubhub.com/manifest.json"> <link rel="search" type="application/opensearchdescription+xml" title="Find food" href="/opensearch.xml"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta charset="utf-8"> . <div class="menuItemNew-price u-rounded--large s-textBox"><cb-icon class="menuItem-loading"><svg class="cb-icon cb-icon-svg cb-icon--sm" aria-hidden="true"><use xlink:href="#clock-back"></use></svg></cb-icon><span class="menuItem-priceAmount h6 s-textBox-title u-margin-bottom-cancel"><span class="" data-testid="menu-item-price" itemprop="price">$39.60</span><span data-testid="menu-item-price-plus" class="menuItem-pricePlus">+</span></span></div></div></button></article></div></div></div></div></div></div></span></div></div></span></div></div></div></div></div></div></main><a name="reviews"></a><div><div data-testid="restaurant-about-reviews-sections" class="s-container-lg u-block u-inset-3"><div class="s-row"><div class="s-col-xs-12"><div id="navSection-about" class="navSection" tabindex="0"><span data-testid="restaurant-about" id="ghs-restaurant-about"><div class="restaurantAbout"><h2 data-testid="restaurantAbout-header">Buca di Beppo Menu Info</h2><div class="restaurantAbout-details"><div data-testid="restaurantAbout-cuisines" class="s-col-xs-12"><a data-testid="restaurantAbout-cuisines--Dinner" class="restaurantAbout-details-cuisines-link u-padding-cancel s-link" href="/delivery/ca-campbell/dinner">Dinner,♂</a><a data-testid="restaurantAbout-cuisines--Lunch Specials" class="restaurantAbout-details-cuisines-link u-padding-cancel s-link" href="/delivery/ca-campbell/lunch_specials">Lunch Specials,♂</a><a data-testid="restaurantAbout-cuisines--Pasta" class="restaurantAbout-details-cuisines-link u-padding-cancel s-link" href="/delivery/ca-campbell/pasta">Pasta,♂</a><a data-testid="restaurantAbout-cuisines--Pizza" class="restaurantAbout-details-cuisines-link u-padding-cancel s-link" href="/delivery/ca-campbell/pizza">Pizza</a></div><span class=""><div data-testid="restaurant-price-rating" class="price-scale priceRating" title="$$$"><div data-testid="restaurant-price-rating-base" class="priceRating-base">$$$$$</div><div data-testid="restaurant-price-rating-value" class="priceRating-value" itemprop="priceRange">$$$</div></div></span></div><div class="restaurantAbout-info u-stack-y-4"><div class="restaurantAbout-info-contact"><a data-testid="restaurant-about-google-map-link" href="https://maps.google.com?daddr=1875%20S%20Bascom%20Ave%20Campbell%20CA%2095008" target="_blank" rel="noopener"><span data-testid="static-map" class="restaurantAbout-info-map"></span></a><a target="_blank" rel="noopener" data-testid="restaurant-about-address" href="http://maps.google.com/maps?daddr=1875 S Bascom Ave, Campbell, CA, 95008" class="restaurantAbout-info-address u-line-bottom u-line--thin u-line--light"><div>1875 S Bascom Ave</div>Campbell, CA 95008</a><div class="u-line-bottom u-line--thin u-line--light restaurantAbout-info-phone"><button data-testid="restaurant-phone-button" itemprop="telephone" content="4083777722" class="s-btn s-btn-tertiary u-padding-cancel restaurant-phone-button type"><span class="">(408) 377-7722</span></button></div><a href="/food/buca_di_beppo" data-testid="restaurantAbout-chainUrl"><div class="restaurantAbout-info-bottom restaurantAbout-info-chainLink u-line-bottom u-line--thin u-line--light"><span>View more about </span>Buca di Beppo</div></a></div><div class="restaurant-hours" data-testid="restaurant-hours"><h5 class="u-background--tinted u-inset-squished-4 u-text-secondary">Hours</h5><div class="u-inset-4 body u-flex u-flex-direction-row u-flex-justify-xs--between copy u-line-bottom u-line--thin u-line--light"><span data-testid="days0">Today</span><div class="u-text-right u-flex u-flex-direction-column"><div class="u-flexbox-order-2 u-text-secondary" data-testid="pickupHours00">Pickup: 10:30am–9:30pm</div><div class="u-flexbox-order-1" data-testid="deliveryHours00">Delivery: 10:30am–9:30pm</div></div></div><button data-testid="show-full-schedule-link" class="s-btn s-btn-tertiary u-inset-squished-4">See the full schedule</button></div></div></div></span><span data-testid="ghs-impression-tracker" style="width: 100%;"><div data-testid="taking-orders-carousel"><span data-testid="restaurant-section-data" type="sponsored" class="restaurant-section-data restaurant-sponsored"><div data-testid="in-view" class=""><span class="r2p"><ghs-restaurant-carousel><div class=" carousel-container s-container"><span class="p2r"><div data-testid="carousel" class="ghsCarousel"><div class="ghsCarousel"><span data-testid="carousel-scroll-wrapper" class="ghsCarousel-content ghsCarousel-content-scroll ghsCarousel-slides promo-carousel"></span></div></div></span></div></ghs-restaurant-carousel></span></div></span></div></span></div><span id="navSection-reviews" class="navSection" data-testid="ghs-impression-tracker"><div id="ghs-restaurant-reviews" class="u-block" data-testid="restaurant-reviews"><div data-testid="in-view" class=""><div class="u-background restaurantReviews clearfix"><div class="u-section-6" data-testid="restaurantReviews-container" id="restaurantPage-reviewHighlights"><div class="clearfix u-unclickable restaurantReviews-heading"><div class="s-row restaurantReviews-heading-content"><div data-testid="facet-header" class="s-col-md-8 s-form-group"><h2> Reviews for Buca di Beppo</h2><div class="u-stack-y-4"><span data-testid="star-rating-id"><div class="" data-testid="starRating"><span class="" data-testid="stars"><div class="stars stars--sm" data-testid="stars-static" style="background-position: 0px -168px;"></div></span><span data-testid="star-rating-text" class="u-text-secondary caption u-margin-cancel">208 <span>ratings</span></span></div></span></div><div class="restaurantReviews-ratingFacets u-stack-y-4"><span data-testid="review-section-rating-facets"><div class="ratingsFacets" data-testid="ratingfacets"><div class="" data-testid="ratingsfacet-details"><p class="ratingsFacet-header u-stack-y-4 body" data-testid="ratingsfacet-header">Here's what people are saying:</p><ul data-testid="ratingsfacet-facetlist" class="ratingsFacet-facetList s-row u-gutterless-3"><li class="ratingsFacet-facetList-listItem s-col-xs-4 u-gutter-3"><span class="u-stack-y-1 ratingsFacet-percent h5 u-margin-bottom-cancel">88</span> <span class="ratingsFacet-facetDesc u-text-secondary u-margin-bottom-cancel caption secondary">Food was good</span></li><li class="ratingsFacet-facetList-listItem s-col-xs-4 u-gutter-3"><span class="u-stack-y-1 ratingsFacet-percent h5 u-margin-bottom-cancel">79</span> <span class="ratingsFacet-facetDesc u-text-secondary u-margin-bottom-cancel caption secondary">Delivery was on time</span></li><li class="ratingsFacet-facetList-listItem s-col-xs-4 u-gutter-3"><span class="u-stack-y-1 ratingsFacet-percent h5 u-margin-bottom-cancel">88</span> <span class="ratingsFacet-facetDesc u-text-secondary u-margin-bottom-cancel caption secondary">Order was accurate</span></li></ul></div></div></span></div></div><div class="s-col-md-4 u-stack-y-3"></div></div></div><div class="restaurantReviews-restaurantPagePadding" data-testid="restaurantReviews-body" impressionid="reviewBodyId"><div class="review-container--loading"><div class="" data-testid="allReviews-sortBar"><span class="caption u-text-primary u-margin-bottom-cancel u-flex u-flex-align-xs--center"></span></div></div></div><span></span></div></div></div></div></span><span data-testid="faqs"><div data-testid="faqs-container" class="u-background u-inset-squished-3"><div class="u-padding-top-large"><div class="s-row"><div data-testid="faqs-heading" class="s-col-xs-12"><h2 class="u-stack-y-4 h1">FAQs</h2></div><div data-testid="faqs-body-container" itemscope="" itemtype="http://schema.org/FAQPage"><div data-testid="faq-question" class="s-col-xs-12 u-stack-y-4" itemprop="mainEntity" itemscope="" itemtype="http://schema.org/Question"><h6 itemprop="name"><span>Q) </span>Does Buca di Beppo (1875 S Bascom Ave) deliver?</h6><div class="faq-answer" data-testid="faq-answer" itemprop="acceptedAnswer" itemscope="" itemtype="http://schema.org/Answer"><span>A) </span><span itemprop="text"><span data-testid="safe-html"><div xmlns="http://www.w3.org/1999/xhtml" id="safeHtmlWrapper0">Yes, Buca di Beppo (1875 S Bascom Ave) delivery is available on Grubhub.</div></span></span></div></div><div data-testid="faq-question" class="s-col-xs-12 u-stack-y-4" itemprop="mainEntity" itemscope="" itemtype="http://schema.org/Question"><h6 itemprop="name"><span>Q) </span>Does Buca di Beppo (1875 S Bascom Ave) offer contact-free delivery?</h6><div class="faq-answer" data-testid="faq-answer" itemprop="acceptedAnswer" itemscope="" itemtype="http://schema.org/Answer"><span>A) </span><span itemprop="text"><span data-testid="safe-html"><div xmlns="http://www.w3.org/1999/xhtml" id="safeHtmlWrapper1">Yes, Buca di Beppo (1875 S Bascom Ave) provides contact-free delivery with Grubhub.</div></span></span></div></div><div data-testid="faq-question" class="s-col-xs-12 u-stack-y-4" itemprop="mainEntity" itemscope="" itemtype="http://schema.org/Question"><h6 itemprop="name"><span>Q) </span>What type of food is Buca di Beppo (1875 S Bascom Ave)?</h6><div class="faq-answer" data-testid="faq-answer" itemprop="acceptedAnswer" itemscope="" itemtype="http://schema.org/Answer"><span>A) </span><span itemprop="text"><span data-testid="safe-html"><div xmlns="http://www.w3.org/1999/xhtml" id="safeHtmlWrapper2">Buca di Beppo (1875 S Bascom Ave) is a Italian restaurant.</div></span></span></div></div><div data-testid="faq-question" class="s-col-xs-12 u-stack-y-4" itemprop="mainEntity" itemscope="" itemtype="http://schema.org/Question"><h6 itemprop="name"><span>Q) </span>Is Buca di Beppo (1875 S Bascom Ave) eligible for Grubhub+ free delivery?</h6><div class="faq-answer" data-testid="faq-answer" itemprop="acceptedAnswer" itemscope="" itemtype="http://schema.org/Answer"><span>A) </span><span itemprop="text"><span data-testid="safe-html"><div xmlns="http://www.w3.org/1999/xhtml" id="safeHtmlWrapper3">Yes, Grubhub offers free delivery for Buca di Beppo (1875 S Bascom Ave) with a <a href="https://www.grubhub.com/plus">Grubhub+</a> membership.</div></span></span></div></div></div></div></div></div></span> . <script type="text/javascript" id="tealium-script" src="https://tags.tiqcdn.com/utag/grubhubseamless/grubhub/prod/utag.js"></script><div><span data-testid="popover-content" id="ghs-popover-content-0"><aside class="ghsPopover rightHAlign floatingCartDropDown floatingCart groupOrder-convertLink-onTop ghsPopover--undefined-theme isClosed fade" role="tooltip" style="inset: -10000px auto auto;"><div class="ghsPopover-spacer"></div><div class="popover-content"><span data-testid="closed-bag-popover" class="u-block" style="min-height: 150px; min-width: 300px;"><aside id="ghs-globalCart-container"><span><span data-testid="global-cart"><div data-testid="global-cart-body" id="global-cart" class="globalCart-panel body" tabindex="-1"><span data-testid="sev-one"></span><section class="globalCart-panel-contents"><div class="cart-error"><div class="globalCart-symbol u-text-secondary"></div><div class="cart-error-emptyCart u-text-center"><h5 class="cart-error-title">Your bag is empty.</h5></div></div></section></div></span></span></aside></span></div><span class="popover-caret popover-caret--undefined-theme"></span></aside></span></div><script type="text/javascript" id="clickstream-tag" src="https://assets.grubhub.com/libs/clickstreamjs/2.0.21/clickstream2.min.js"></script><script type="text/javascript" id="perimeter-x-script-tag" src="https://sensor.grubhub.com/O97ybH4J/init.js"></script><script type="text/javascript" id="app-boy-script" src="//assets.grubhub.com/libs/appboy/1.6/appboy.min.js"></script><script type="text/javascript" id="inauth-script-tag" src="https://www.cdn-net.com/cc.js?ts=1661037629801"></script><div><span data-testid="popover-content" id="ghs-popover-content-1"><aside class="ghsPopover centerHAlign ghsPopover--undefined-theme isClosed fade" role="tooltip" style="inset: -10000px auto auto;"><div class="ghsPopover-spacer"></div><div class="popover-content"><div class="ratingsFacet-popover"><span data-testid="review-section-rating-facets"><div class="ratingsFacets ratingsFacets--popover" data-testid="ratingfacets"><div class="u-inset-4 u-text-center" data-testid="ratingsfacet-details"><p class="ratingsFacet-header u-stack-y-4 body" data-testid="ratingsfacet-header">Here's what people are saying:</p><ul data-testid="ratingsfacet-facetlist" class="ratingsFacet-facetList s-row u-gutterless-3"><li class="ratingsFacet-facetList-listItem u-line-right u-line--thin s-col-xs-4 u-gutter-3"><span class="u-stack-y-1 ratingsFacet-percent h5">88</span> <span class="ratingsFacet-facetDesc u-text-secondary u-margin-bottom-cancel caption">Food was good</span></li><li class="ratingsFacet-facetList-listItem u-line-right u-line--thin s-col-xs-4 u-gutter-3"><span class="u-stack-y-1 ratingsFacet-percent h5">79</span> <span class="ratingsFacet-facetDesc u-text-secondary u-margin-bottom-cancel caption">Delivery was on time</span></li><li class="ratingsFacet-facetList-listItem u-line-right u-line--thin s-col-xs-4 u-gutter-3"><span class="u-stack-y-1 ratingsFacet-percent h5">88</span> <span class="ratingsFacet-facetDesc u-text-secondary u-margin-bottom-cancel caption">Order was accurate</span></li></ul></div></div></span></div></div><span class="popover-caret popover-caret--undefined-theme"></span></aside></span></div><script src="https://cdn.branch.io/branch-latest.min.js" id="branch loader" async="true"></script><div id="ttdUniversalPixelTag" style="display: none;"></div></body></html>
Answered By - undetected Selenium
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.