Monday, February 21, 2022

python 3.6

February 21, 2022 attributeerror, beautifulsoup, python, spyder No comments

Issue

Hey this is my setup: Spyder 3.3.1 / beautifulsoup4 / python 3.6

The below code is from an article on medium (here) about webscraping with python and Beautifulsoup. Was supposed to be a quick read but now TWO days later I still cant not get the code to run in spyder and keep getting:

File "/Users/xxxxxxx/Documents/testdir/swiftScrape.py", line 9, in table_to_df
    return pd.DataFrame([[td.text for td in row.findAll('td')] for row in table.tbody.findAll('tr')])

AttributeError: 'NoneType' object has no attribute 'tbody'

Not sure what is going wrong and seems to be an implementation error. Can anyone assist in sheding some light on this issue.

Thanks in advance.

import os
import bs4
import requests
import pandas as pd

PATH = os.path.join("C:\\","Users","xxxxx","Documents","tesdir")

def table_to_df(table):
    return pd.DataFrame([[td.text for td in row.findAll('td')] for row in table.tbody.findAll('tr')])

def next_page(soup):
    return "http:" + soup.find('a', attrs={'rel':'next'}).get('href')

res = pd.DataFrame()
url = "http://bank-code.net/country/FRANCE-%28FR%29/"
counter = 0

while True:
    print(counter)
    page = requests.get(url)
    soup = bs4.BeautifulSoup(page.content, 'lxml')
    table = soup.find(name='table', attrs={'id':'tableID'})
    res = res.append(table_to_df(table))
    res.to_csv(os.path.join(PATH,"BIC","table.csv"), index=None, sep=';', encoding='iso-8859-1')
    url = next_page(soup)
    counter += 1

Solution

Thanks @bruno desthuilliers for your pointers. Much appreciated.

This is the rewritten code that worked for me using Selenium and webdriver rather than import requests:

import os
import bs4
import pandas as pd
from selenium import webdriver

PATH = os.path.join('/','Users','benmorris','documents','testdir')

def table_to_df(table):
    return pd.DataFrame([[td.text for td in row.find_all('td')] for row in soup.find_all('tr')])

def next_page(soup):
    return "http:" + soup.find('a', attrs={'rel':'next'}).get('href')

res = pd.DataFrame()
url = "http://bank-code.net/country/FRANCE-%28FR%29/"
counter = 0
driver = webdriver.Chrome()
driver.get(url)

while True:
    print(counter)
    page = driver.get(url)
    soup = bs4.BeautifulSoup(driver.page_source, 'lxml')
    table = driver.find_element_by_xpath('//*[@id="tableID"]')
    if table is None:
        print("no table 'tableID' found for url {}".format(url))
        print("html content:\n{}\n".format( page.content))
        continue
    res = res.append(table_to_df(table))
    res.to_csv(os.path.join(PATH,"BIC","table.csv"), index=False, sep=',', encoding='iso-8859-1')
    url = next_page(soup)
    counter += 1

Answered By - Morb

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, February 21, 2022

[FIXED] AttributeError: 'NoneType' object has no attribute 'tbody' - Spyder 3.3.1 / beautifulsoup4 / python 3.6

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels