Issue
I am trying to scrape fields within a Span ID, but the value is not as simple as using find and taking the text from a span.
Below is the HTML from the webpage. HTML
I am trying to print "B0C4YKLXPQ"
This gets me the
Below are all attempts that failed.
- page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")
- page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).span["data-asin"]
- page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).find_all("data-asin")
- page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")
- page_soup.find("div", {"id": "twisterContainer"}).find_all(["data-asin"])
Solution
The following code has good chances of working, unless your IP has been blacklisted by Amazon for some various reasons, like too many scraping attempts:
import requests
from bs4 import BeautifulSoup as bs
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://www.amazon.com/dp/B002G9UDYG'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
item = soup.select_one('span[id="fitRecommendationsSection"]').get('data-asin')
print(item)
Result in terminal:
B0C4YKLXPQ
BeautifulSoup documentation can be found here.
Answered By - Barry the Platipus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.