Issue
I'm trying to scrape soccer results from a website. I get the results with the html and when I try to remove them with .text I get strange output. I use the parent method to get the parent HTML element for the whole score.
The scraper script:
response = requests.get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)
results = html_soup.findAll('strong',text="East Wall Rovers")
chosen_team_results=[]
for result in results:
chosen_team_results.append(result.parent.text)
print(chosen_team_results)
HTML:
<p class="zeta"><strong>
Killester Donnycarney FC</strong>
1
<strong>Cherry Orchard</strong>
2
</p>
<p class="zeta"><strong>
Ballymun United</strong>
2
<strong>Bluebell United</strong>
1
</p>
OUTPUT:
'\r\n\t\t\tValeview Shankill\r\n\t\t\t1\r\n\t\t\tEast Wall Rovers\r\n\t\t\t5\r\n\t\t\t\t\t\t', '\r\n\t\t\tMarks Celtic FC\r\n\t\t\t0\r\n\t\t\tEast Wall Rovers\r\n\t\t\t5\r\n\t\t\t\t\t\t', '\r\n\t\t\tBlessington FC\r\n\t\t\t0\r\n\t\t\tEast Wall Rovers\r\n\t\t\t5\r\n\t\t\t\t\t\t', '\r\n\t\t\tParkvale FC\r\n\t\t\t2\r\n\t\t\tEast Wall Rovers\r\n\t\t\t1\r\n\t\t\t\t\t\t', '\r\n\t\t\tBoyne Rovers\r\n\t\t\t1\r\n\t\t\tEast Wall Rovers\r\n\t\t\t1\r\n\t\t\t\t\t\t'
I expect the results to be in plain text just the teams and the points.
Solution
To get rid of the blank space, I recommend you do something like this:
for result in results:
chosen_team_results.append(''.join(str(result.parent.text).split()))
print(chosen_team_results)
Answered By - Arnav Chawla
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.