Issue
I tried to scrap the search result elements on this page: https://shop.bodybuilding.com/search?q=protein+bar&selected_tab=Products with selenium but it gives me only the 4 first elements as a result. I am not sure why? it is a javascript page? and how can I scrap all the elements on this search page? here is the code I created :
import requests
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome(executable_path='C:/chromedriver')
url = 'https://shop.bodybuilding.com/search?q=protein+bar&selected_tab=Products'
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
all_items = soup.find_all('div', {'class': 'ProductTile ProductTile--flat Animate AnimateOnHover Animate--fade-in Animate--animated'})
for i in range(len(all_items)):
prices=all_items[i].find('div', {'class': 'Price ProductTile__price'}).text
names=all_items[i].find('p', {'class': 'ProductTile__title'}).text
images=all_items[i].find('img')['src']
url=all_items[i].find('a', {'class': 'Anchor ProductTile__image'})['href']
print(images)
this is the result for the names on this page, as you see it only scrapes the first 4 elements !
BSN Protein Crisp Bars
Optimum Nutrition Protein Wafers
Herbaland Vegan Protein Gummies
Battle Bars Full Battle Rattle (FBR) Protein Bar
the same for prices, images, and URLs?
Solution
How to fix
You have to scroll, so all items will be loaded:
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(1)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
soup = BeautifulSoup(driver.page_source, 'html.parser')
all_items = soup.find_all('div', {'class': 'ProductTile ProductTile--flat Animate AnimateOnHover Animate--fade-in Animate--animated'})
for i in all_items:
prices=i.find('div', {'class': 'Price ProductTile__price'}).text if i.find('div', {'class': 'Price ProductTile__price'}) else None
names=i.find('p', {'class': 'ProductTile__title'}).text
images=i.find('img')['src']
url=i.find('a', {'class': 'Anchor ProductTile__image'})['href']
print(images)
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.