Issue
I'm building a program that will scrape a website (in this case, eBay) for products so that I can compare them. I have a URL that links to a page with Dyson fans and returns all tags with the class 's-item'. I am attempting to access the link within the tag to the product, but when using the usual syntax (tag['href'] and tag.find_all('href')) I get an error and None (respectively). Code:
from bs4 import BeautifulSoup, NavigableString
import requests
# create soup + find product tags
markup = requests.get(
url='https://www.ebay.com/sch/i.html?
_nkw=dyson+tower+fan&LH_ItemCondition=1000&ipg=240&_sop=12').content
soup = BeautifulSoup(markup, 'html.parser')
products = [soup.find_all('li', attrs={'class': 's-item'})][0]
# Attempt to extract link
tag = products[30]
print(f'Tag : {tag}')
try:
print(f'Link : {tag["href"]}')
except:
print('No href attribute')
print(f'Using find_all to find href : {tag.find_all("href")}')
Solution
The following code is providing the desired links
from bs4 import BeautifulSoup
import requests
url = 'https://www.ebay.com/sch/i.html?_from=R40&_nkw=Dyson+fans+&_sacat=0&_pgn=1'
req = requests.get(url)
soup = BeautifulSoup(req.text,'lxml')
for link in soup.select('li.s-item'):
print(link.a.get('href'))
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.