Issue
I try to scrape players statistics of this game: "https://siege.gg/matches/5694-invitational-intl-faze-clan-vs-team-liquid" but it looks like my code does not retrieve all the html Can someone help me please?
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
url="https://siege.gg/matches/5694-invitational-intl-faze-clan-vs-team-liquid"
match_page=requests.get(url, headers=headers)
match_soup = BeautifulSoup(match_page.content, features="lxml")
all_stats_soup=match_soup.find(id="DataTables_Table_0_wrapper")
this part of html does not appear on "match_soup" so when I do my soup. find it returns a none
Solution
The data is inside javascript variable. You can use re
module to parse it.
This example will parse the table data into a panda
DataFrame:
import re
import requests
import pandas as pd
from io import StringIO
url = "https://siege.gg/matches/5694-invitational-intl-faze-clan-vs-team-liquid"
html_doc = requests.get(url).text
df = pd.read_html(StringIO(re.search(r"var a = `(.*)`", html_doc).group(1)))[0]
print(df)
Prints:
Unnamed: 0 Rating K-D (+/-) Entry (+/-) KOST KPR SRV 1vX Plant HS% Atk Def Team
0 cameram4n 0.74 16-27 (-11) 1-4 (-3) 56% 0.44 25% 1 0 47% Iana Mute 50
1 muringa 0.83 15-20 (-5) 1-3 (-2) 58% 0.42 44% 0 1 67% Thatcher Smoke 19
2 Astro 1.03 24-23 (+1) 2-3 (-1) 56% 0.67 36% 2 3 50% Ace Kaid 50
3 NESKWGA 1.20 35-25 (+10) 5-5 (+0) 58% 0.97 31% 0 1 56% Hibana Jager 19
4 Bullet1 0.84 22-29 (-7) 5-7 (-2) 53% 0.61 19% 0 1 32% Ash Jager 50
5 psk1 0.83 16-23 (-7) 2-6 (-4) 61% 0.44 36% 0 1 31% Nomad Mute 19
6 xS3xyCake 1.13 27-23 (+4) 5-1 (+4) 78% 0.75 36% 0 3 50% Maverick Echo 19
7 Cyber 0.90 25-28 (-3) 4-4 (+0) 56% 0.69 22% 0 0 36% Sledge Smoke 50
8 Paluh 1.47 42-21 (+21) 6-2 (+4) 72% 1.17 42% 3 0 72% Sledge Melusi 19
9 soulz1 0.88 24-29 (-5) 5-1 (+4) 58% 0.67 19% 0 1 52% Maverick Echo 50
Or with bs4
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(
re.search(r"var a = `(.*)`", html_doc).group(1), "html.parser"
)
for tr in soup.select("tr"):
print(*tr.get_text(strip=True, separator="|").split("|"), sep="\t")
Prints:
Rating K-D (+/-) Entry (+/-) KOST KPR SRV 1vX Plant HS% Atk Def Team
cameram4n 0.74 16-27 (-11) 1-4 (-3) 56% 0.44 25% 1 0 47% Iana Mute 50
muringa 0.83 15-20 (-5) 1-3 (-2) 58% 0.42 44% 0 1 67% Thatcher Smoke 19
Astro 1.03 24-23 (+1) 2-3 (-1) 56% 0.67 36% 2 3 50% Ace Kaid 50
NESKWGA 1.20 35-25 (+10) 5-5 (+0) 58% 0.97 31% 0 1 56% Hibana Jager 19
Bullet1 0.84 22-29 (-7) 5-7 (-2) 53% 0.61 19% 0 1 32% Ash Jager 50
psk1 0.83 16-23 (-7) 2-6 (-4) 61% 0.44 36% 0 1 31% Nomad Mute 19
xS3xyCake 1.13 27-23 (+4) 5-1 (+4) 78% 0.75 36% 0 3 50% Maverick Echo 19
Cyber 0.90 25-28 (-3) 4-4 (+0) 56% 0.69 22% 0 0 36% Sledge Smoke 50
Paluh 1.47 42-21 (+21) 6-2 (+4) 72% 1.17 42% 3 0 72% Sledge Melusi 19
soulz1 0.88 24-29 (-5) 5-1 (+4) 58% 0.67 19% 0 1 52% Maverick Echo 50
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.