Issue
I found this code on someone else's github, and am slightly adapting it. The notebook is meant to scrape data from a website. I've passed the contents of the website through an html parser, Here's some code with the loop:
players = soup.find_all("td", class_ = "player")
player_list = []
for player in players[0:52]:
player_list.append(player.a.text)
I just don't understand why adding the .a.text
changes player_list compared to just having append(player) in the for loop. I can't find anything about
Example of output from just append(player):
<td class="player"><span style="display:none">Smith</span>
<a href="https://www.example.com">Brian Smith</a>
</td>,
Output from append(player.a.text):
'Brian Smith',
Solution
find_all
returns a list of tags. Since this returns a list, there's really no reason to append to anything else.
The dot-operator iterates through children of those tags, and .text
accesses the non-HTML portion within the anchor (a
) tag (plaintext).
Therefore, you get a list of Python strings. Which you could also get from
player_list = [player.a.text for player in soup.find_all("td", class_="player")]
first_52_players = player_list[:52]
Answered By - OneCricketeer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.