Tuesday, September 27, 2022

[FIXED] How to resolve error with None type of soup.find table?

September 27, 2022 beautifulsoup, python, selenium, selenium-webdriver, web-scraping No comments

Issue

I try to get a table by using BeautifulSoap, and I faced error while using find method.

I want to get headers of table from here: https://stooq.pl/t/?i=513&v=1&l=1

The id of a table i interested in is fth1, and HTML looks like that:

<table class="fth1" id="fth1" width="100%" cellspacing="0" cellpadding="3" border="0">
    <thead style="background-color:e9e9e9">
        <tr align="center">
            <th id="f13">
                <a href="t/?i=513&amp;v=1&amp;o=1">Symbol</a>
            </th>
            <th id="f13">
                <a href="t/?i=513&amp;v=1&amp;o=2">Nazwa</a>
            </th>
        ...

My python script:

from selenium import webdriver
import requests
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
page = requests.get('https://stooq.pl/t/?i=513&v=1&l=1')
soup = BeautifulSoup(page.text, 'lxml')

table1 = soup.find('table', {'id': "fth1"})

headers = []
for i in table1.find_all('th'):
    title = i.text
    headers.append(title)

print(headers)

I got the error:

Traceback (most recent call last): File "/home/.../script.py", line 25, in for i in table1.find_all('th'): AttributeError: 'NoneType' object has no attribute 'find_all'

I found that the variable table1 has a type None. I've tried use html.parser and html5lib instead of lxml but with no success.

What is wrong that I got such error?

Solution

You can still scrape the site, in order to do so you need to copy your cookies/headers from your browser and inject them into the request. If you go to your Network tab on the browser, find the HTML document and inspect it or right click and copy as curl, you can then convert that to python.

Your request would then look something like this (but with your cookies):

import requests
from bs4 import BeautifulSoup

cookies = {
    'cookie_uu': '',
    'privacy': '',
    'PHPSESSID': '',
    'uid': '',
    'cookie_user': '',
    '_ga': '',
    '_gid': '',
    '__gads': '',
    'FCCDCF': '',
    'FCNEC': '',
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-GB,en;q=0.5',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
}

params = {
    'i': '513',
    'v': '1',
    'l': '1',
}

response = requests.get('https://stooq.pl/t/', params=params, cookies=cookies, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')

table = soup.find('table', {'id': "fth1"})

headers = [i.text for i in table.find_all('th')]
print(headers)

This returns:

['Symbol', 'Nazwa', 'Otwarcie', 'Max', 'Min', 'Kurs', 'Zmiana', 'Wolumen', 'Obrót', 'Data', '']

Answered By - Sam

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, September 27, 2022

[FIXED] How to resolve error with None type of soup.find table?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels