Friday, August 19, 2022

[FIXED] How to select elements without a class using beautifulsoup

August 19, 2022 beautifulsoup, css-selectors, data-extraction, python-3.x, web-scraping No comments

Issue

scraping the Fbref website to get specific player info so that I can use that for further analysis. I have selected the table I want to scrape. The information I want is in <tr> tags without any class attributes. But the issue is that this table has many headers in <tr> tags that have a class name

import requests
from bs4 import BeautifulSoup
from time import sleep
url = "https://fbref.com/en/comps/9/2021-2022/stats/2021-2022-Premier-League-Stats"

response = requests.get(url).text.replace('<!--', '').replace('-->', '')

soup = BeautifulSoup(response, "html.parser")

I have selected the desired table I want to scrape. I want to select <tr> tags that don't have any class attribute because that's where the information I want is located.

players_table = soup.select("table#stats_standard tbody tr", class_ =None)

I have then looped through the players_table so that I can get each player's info like name, country, position, etc.

for player in players_table:
     player_name = player.find("td", attrs={"data-stat" : "player"}).a.text   
    print(player_name)
    sleep(2)

But now the problem is that my code will loop through the table and when it finds the <tr class="theads"> tag, it tries to look for its <a> tag and then further look for the text in the <a> tag. But this specific <tr class="theads"> tag doesn't have any <a> tags and that makes my code to break and get this error message 'NoneType' object has no attribute 'a' when I try to run it.

My code prints the names of the players untill it finds this <tr class="theads"> tag with no <a> then it just fails & breaks. I have even tried to decompose or clear this <tr class="theads"> tag, but it still doesn't work.

player.find(".thead").decompose()

So my question is how can I select only tags that don't have any class so that when my reaches tag, it just neglects it. I have actually tried doing that by using class_ = None when making the table

players_table = soup.select("table#stats_standard tbody tr", class_ =None)

But this didn't solve anything. I need your help on this, please.

Solution

If you only wanna exclude the subheaders adjust your selector, that it only selects these <tr> without class .thead:

soup.select('table#stats_standard tbody tr:not(.thead)')

or more specific to the title of your question that do not have a class attribute:

soup.select('table#stats_standard tbody tr:not([class])')

Example

import requests
from bs4 import BeautifulSoup
from time import sleep
url = "https://fbref.com/en/comps/9/2021-2022/stats/2021-2022-Premier-League-Stats"

response = requests.get(url).text.replace('<!--', '').replace('-->', '')

soup = BeautifulSoup(response)

for player in soup.select('table#stats_standard tbody tr:not([class])'):
    player_name = player.find("td", attrs={"data-stat" : "player"}).a.text   
    print(player_name)

Answered By - HedgeHog

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, August 19, 2022

[FIXED] How to select elements without a class using beautifulsoup

Issue

Solution

Example

0 comments:

Post a Comment

Popular Posts

Labels