Issue
Hi I am trying to do a simple web scrape on this website https://www.sayurbox.com/p/Swallow%20Tepung%20Agar%20Agar%20Tinggi%20Serat%207%20gram
where my code is this:
def userAgent(URL):
ua = UserAgent()
USER_AGENT = ua.random
headers = {"User-Agent" : str(USER_AGENT),"Accept-Encoding": "*","Connection": "keep-alive"}
resp = requests.get(URL, headers=headers)
if resp.status_code == 200:
soup = BeautifulSoup(resp.content, "html.parser")
print(f'{URL}')
else:
print(f'error 200:{URL}')
urlError = pd.DataFrame({'url':[URL],
'date':[dateNow]
})
urlError.to_csv('errorUrl/errorUrl.csv', mode='a', index=False, header=False)
return soup
soup = userAgent(url)
productTitle = soup.find_all('div', {"class":"InfoProductDetail__shortDesc"})
However it is unable to do so, is there something wrong with my code? I tried adding time.sleep to wait for the page to load, however it still does not work. Help will be greatly appreciated
Solution
Your code is fine but the url is dynamic meaning data is generated by JavaScript and requests,BeautifulSoup can't mimic that's you need automation tool something like selenium.Now you can run the code.
from bs4 import BeautifulSoup
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://www.sayurbox.com/p/Swallow%20Tepung%20Agar%20Agar%20Tinggi%20Serat%207%20gram'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
time.sleep(5)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.close()
title=soup.select_one('.InfoProductDetail__shortDesc').text
price= soup.select_one('span.InfoProductDetail__price').text
print(title)
print(price)
Output:
Swallow Tepung Agar Agar Tinggi Serat 7 gram
7.900
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.