Friday, September 16, 2022

[FIXED] How to scrape mulitple pages at once, avoiding: 'ResultSet' object has no attribute 'find_all'?

September 16, 2022 beautifulsoup, loops, python, python-3.x, web-scraping No comments

Issue

I am trying to scrape the text of the span tags, however I get the error that "ResultSet object has no attribute 'find_all'". I think i have to have a second for-loop inside the last one. However i can't wrap my head around how to do this.

from bs4 import BeautifulSoup
import requests

urls = []
soups = []
divs = []

for i in range(20):
    i=i+1
    url = "https://www.autoscout24.de/lst?sort=standard&desc=0&cy=D&atype=C&ustate=N%2CU&powertype=ps&ocs_listing=include&adage=1&page=" + str(i)
    urls.append(url)

for url in urls:
    page = requests.get(url)
    soups.append(BeautifulSoup(page.content, "html.parser"))

for soup in range(len(soups)):
    divs.append(soups[soup].find_all("div", class_="VehicleDetailTable_container__mUUbY"))
    
for div in range(len(divs)):
    mileage = divs[div].find_all("span", class_="VehicleDetailTable_item__koEV4")[0].text
    year = divs[div].find_all("span", class_="VehicleDetailTable_item__koEV4")[1].text
    print(mileage)
    print(year)
    print()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_17/4044669827.py in <module>
     21 
     22 for div in range(len(divs)):
---> 23     mileage = divs[div].find_all("span", class_="VehicleDetailTable_item__koEV4")[0].text
     24     year = divs[div].find_all("span", class_="VehicleDetailTable_item__koEV4")[1].text
     25     print(mileage)

/opt/conda/lib/python3.7/site-packages/bs4/element.py in __getattr__(self, key)
   2288         """Raise a helpful exception to explain a common code fix."""
   2289         raise AttributeError(
-> 2290             "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
   2291         )

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

Solution

Try to avoid all these lists and loops, it needs only one and will also eliminate the error. Else you have to iterate your ResultSets additionally, but this would not be a good behavior.

for i in range(1,21):
    url = f"https://www.autoscout24.de/lst?sort=standard&desc=0&cy=D&atype=C&ustate=N%2CU&powertype=ps&ocs_listing=include&adage=1&page={i}"

    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")

    for div in soup.find_all("div", class_="VehicleDetailTable_container__mUUbY"):
        mileage = div.find_all("span", class_="VehicleDetailTable_item__koEV4")[0].text
        year = div.find_all("span", class_="VehicleDetailTable_item__koEV4")[1].text
        print(mileage)
        print(year)

Be aware Working with this indexes could result in texts that do not match your expectation in case there is no milage or year available. It would be better to scrape the detailpages instead.

Example

from bs4 import BeautifulSoup
import requests

for i in range(1,21):
    url = f"https://www.autoscout24.de/lst?sort=standard&desc=0&cy=D&atype=C&ustate=N%2CU&powertype=ps&ocs_listing=include&adage=1&page={i}"

    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")

    for div in soup.find_all("div", class_="VehicleDetailTable_container__mUUbY"):
        mileage = div.find_all("span", class_="VehicleDetailTable_item__koEV4")[0].text
        year = div.find_all("span", class_="VehicleDetailTable_item__koEV4")[1].text
        print(mileage)
        print(year)

Answered By - HedgeHog

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, September 16, 2022

[FIXED] How to scrape mulitple pages at once, avoiding: 'ResultSet' object has no attribute 'find_all'?

Issue

Solution

Example

0 comments:

Post a Comment

Popular Posts

Labels