Saturday, September 17, 2022

[FIXED] AttributeError: 'NoneType' object has no attribute 'html' Error

September 17, 2022 beautifulsoup, htmlsession, python, python-requests-html No comments

Issue

I'm trying to abstract my scrapper so I can scale it to other web pages for a project, but when I run my program I get this error:

from requests_html import HTMLSession
from abc import ABC, abstractmethod
from bs4 import BeautifulSoup as bs

class AbstractClass(ABC):
    @abstractmethod
    def __init__(self,url):
        self.url = url

    @abstractmethod
    def getSession(self):
        self.session = HTMLSession() 
        self.url_ = self.session.get(self.url)
        self.url_ = self.url_.html.render(timeout=20)
        self.soup = bs(self.url_.html.html, 'lxml')
        print(self.soup.prettify())#To proove

class Santander(AbstractClass):
    def __init__(self, url):
        super().__init__(url)

    def getSession(self):
        super().getSession()

santander = Santander('https://banco.santander.cl/beneficios')
santander.getSession()

I get this error, I think it's due to the wrong use of the libraries (I work with "from requests_html import HTMLSession" for the JS that the pages may have) and I tried to move and change some things, but it keeps failing.

Traceback (most recent call last):
  File "c:\Users\Felipe\Documents\Scrapper\scraper.py", line 41, in <module>  
    santander.getSession()
  File "c:\Users\Felipe\Documents\Scrapper\scraper.py", line 38, in getSession
    return super().getSession()
  File "c:\Users\Felipe\Documents\Scrapper\scraper.py", line 15, in getSession
    self.soup = bs(self.url_.html.html, 'lxml')
AttributeError: 'NoneType' object has no attribute 'html'

This is my initial code, before I wanted to start abstracting it, and it works fine.

from bs4 import BeautifulSoup as bs
from requests_html import HTMLSession

session = HTMLSession()

url_ = 'https://banco.santander.cl/beneficios'

url= session.get(url_)
url.html.render(timeout=20)

soup = bs(url.html.html, 'lxml')
#print(soup.prettify())
page_santander = soup.find("section", id="section-promotions")
container =  page_santander.find("div", class_="container")
grid = container.find_all("div", class_="row mini")[0].find_all("div",class_="d-block h-100 cursor-pointer")
#print(len(grid))
for i in range(0, len(grid)):
    title = grid[i].find("h2").get_text()
    summary = grid[i].find("p").get_text()
    #discountUrl = grid[i].find("a").get('href')
print(title)
print(summary)

Solution

Try changing:

self.url_ = self.session.get(self.url)
self.url_ = self.url_.html.render(timeout=20)
self.soup = bs(self.url_.html.html, 'lxml')

Into:

current_url = self.session.get(self.url)
current_url.html.render(timeout=20)
self.soup = bs(current_url.html.html, 'lxml')

Answered By - Wutong

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, September 17, 2022

[FIXED] AttributeError: 'NoneType' object has no attribute 'html' Error

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels