Sunday, October 30, 2022

[FIXED] How to scrape football results from livescores?

October 30, 2022 beautifulsoup, python, python-3.4, urllib, web-scraping No comments

Issue

I have this project am working on using python 3.4. I want to scrape livescore.com for football scores (result) e.g getting all the scores of the day (England 2-2 Norway, France 2-1 Italy, etc). I am building it with python 3.4, windows 10 64bit os.

I have tried two ways this are the codes:

import bs4 as bs
import urllib.request

sauce = urllib.request.urlopen('http://www.livescore.com/').read()
soup = bs.BeautifulSoup(sauce,'lxml')

for div in soup.find_all('div', class_='container'):
    print(div.text)

When I run this code a box pup's up saying:

IDLE's subprocess didn't make connection. Either IDLE can't start a subprocess or firewall software is blocking the connection.

I decided to write another one this is the code:

# Import Modules
import urllib.request
import re

# Downloading Live Score XML Code From Website and reading also
xml_data = urllib.request.urlopen('http://static.cricinfo.com/rss/livescores.xml').read()

# Pattern For Searching Score and link
pattern = "<item>(.*?)</item>"

# Finding Matches
for i in re.findall(pattern, xml_data, re.DOTALL):
    result = re.split('<.+?>',i)
    print (result[1], result[3]) # Print Score

And I got this error:

Traceback (most recent call last):
  File "C:\Users\Bright\Desktop\live_score.py", line 12, in <module>
   for i in re.findall(pattern, xml_data, re.DOTALL):
  File "C:\Python34\lib\re.py", line 206, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

Solution

On your first example - the site is loading its content by heavy javascript so I suggest using selenium in fetching the source.

Your code should look like this:

import bs4 as bs
from selenium import webdriver

url = 'http://www.livescore.com/'
browser = webdriver.Chrome()
browser.get(url)
sauce = browser.page_source
browser.quit()
soup = bs.BeautifulSoup(sauce,'lxml')

for div in soup.find('div', attrs={'data-type': 'container'}).find_all('div'):
    print(div.text)

For the second example, it regular expression engine returns an error because the read() function from your requests gives byte data type, "re" only accepts strings or unicode. So you just t have toypecast xml_data to str.

This is the modified code:

for i in re.findall(pattern, str(xml_data), re.DOTALL):
    result = re.split('<.+?>',i)
    print (result[1], result[3]) # Print Score

Answered By - chad

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, October 30, 2022

[FIXED] How to scrape football results from livescores?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels