Issue
I'm trying to scrape the unique links from a website, but when I do, I get the following error and I'm not sure what's causing it.
ResultSet object has no attribute 'endswith'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
I tried changing the url to see if it was the link, and it didn't work --which I wasn't surprised, but I wanted to check.
I looked at the documentation(https://www.crummy.com/software/BeautifulSoup/bs4/doc/#miscellaneous) and if I'm understanding it correctly, it's saying to use find instead of findall. I tried using find instead, but that didn't pull up anything, but even if it did, it wouldn't pull up what I'm looking for since I'm wanting all unique links.
Anyway, here's the code. Any ideas or places I can look to understand this error more? import requests
from bs4 import BeautifulSoup
import urllib.request
import urllib.parse
url="https://www.census.gov/programs-surveys/popest.html"
r=requests.get(url)
soup= BeautifulSoup(r.content, "html.parser")
links = soup.find_all("a")
for link in links:
link.get("href")
def unique_links(tags,url):
cleaned_links = set()
for link in links:
link = link.get("href")
if link is None:
continue
if link.endswith('/') or links.endswith('#'):
link = link [-1]
actual_url = urllib.parse.urljoin(url,link)
cleaned_links.add(actual_url)
return cleaned_links
cleaned_links = unique_links(links, url)
Solution
There is a typo in your code: link.endswith('#'): instead of links.
if link.endswith('/') or link.endswith('#'):
link = link [-1]
Answered By - Prakhar Jhudele
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.