Issue
I am new to beautiful soup and trying to pull some tables into a Python notebook. There are multiple tables at the site, but if I can get one working I think I can figure out the others. Have followed a few tutorials and think I must be missing something... maybe due to the collapsible
class?
Here is my most recent attempt:
from bs4 import BeautifulSoup
import requests as r
import pandas as pd
uswnt_wiki_request = r.get("https://en.wikipedia.org/wiki/United_States_women%27s_national_soccer_team_results")
uswnt_wiki_text = uswnt_wiki_request.text
uswnt_soup = BeautifulSoup(uswnt_wiki_text, 'html.parser')
table = uswnt_soup.find('table', class_="wikitable collapsible collapsed")
df = pd.read_html(str(table))
df = pd.concat(df)
print(df)
Can somebody help nudge me in the right direction?
Solution
I have made several updates to your code.
- You need to pass headers to get a proper response from Wikipedia.
- You are trying to read all tables, but using
find
instead offind_all
. - You need to clean headers/add headers, as per the table, the first three tables have a header and the last two don't and all of these have unnecessary double headers (Year range and color coding).
This is the final code.
from bs4 import BeautifulSoup as bs
import pandas as pd
import requests
headers = {
'host': 'en.wikipedia.org'
}
response = requests.get("https://en.wikipedia.org/wiki/United_States_women%27s_national_soccer_team_results").text
tables = bs(response, 'html.parser').find_all('table', class_="wikitable collapsible collapsed")
dfs = []
for table in tables:
df = pd.read_html(str(table), skiprows=2)[0]
if not 'Opponent' in df.columns:
df.loc[-1] = df.columns
df.columns = ['M', 'Opponent', 'Date', 'Result', 'Event']
dfs.append(df)
combined = pd.concat(dfs)
print(combined)
Sample Output:
M Opponent Date Result Event
0 1 Italy August 18, 1985 0–1 Mundialito
1 2 Denmark August 21, 1985 2–2 Mundialito
2 3 England August 23, 1985 1–3 Mundialito
3 4 Denmark August 24, 1985 0–1 Mundialito
4 5 Canada July 7, 1986 2–0 Friendly
.. ... ... ... ... ...
57 721 Republic of Ireland April 11, 2023 1–0 Friendly
58 722 Wales July 9, 2023 – Friendly
59 723 Vietnam July 21, 2023 – World Cup
60 724 Netherlands July 26, 2023 – World Cup
61 725 Portugal August 1, 2023 – World Cup
[725 rows x 5 columns]
Answered By - Zero
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.