Issue
I'm trying to scrape a website. I want to print all the elements with the following class name,
class=product-size-info__main-label
The code is the following:
from bs4 import BeautifulSoup with open("MadeInItaly.html", "r") as f:
doc= BeautifulSoup (f, "html.parser")
tags = doc.find_all(class_="product-size-info__main-label")
print(tags)
Result: [XS, XS, S, M, L, XL]
All good here.
Now this is when done on the file MadeInItaly.html (it works) which is basically the same website I am trying to use, but the version saved on my disk.
Now, with the version from the URL.
from bs4 import BeautifulSoup
import requests
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"}
url = "https://www.zara.com/es/es/vestido-midi-volantes-cinturon-con-lino-p00387075.html?v1=258941747&v2=2184287"
result = requests.get(url,headers=headers)
doc = BeautifulSoup(result.text, "html.parser")
tags = doc.find_all(class_="product-size-info__main-label")
print(tags)
Result: []
I have tried with different User Agent Headers, what could be wrong here?
Solution
As already answered, the problem is that the elements are loaded with js with the class you are looking for. Here is a post that solves the problem for you with Selenium. It works with that:
https://stackoverflow.com/a/11238391/21607327
Answered By - Georgis
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.