Thursday, August 25, 2022

[FIXED] What is the easiest way to append data to Pandas DataFrame?

August 25, 2022 beautifulsoup, dataframe, pandas, python, web-scraping No comments

Issue

I am trying to append scraped data to a dataframe:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
import requests
import csv
url="https://en.wikipedia.org/wiki/List_of_German_football_champions"
page=requests.get(url).content
soup=BeautifulSoup(page,"html.parser")

seasons=[]
first_places=[]
runner_ups=[]
third_places=[]
top_scorrers=[]

tbody=soup.find_all("tbody")[7]
trs=tbody.find_all("tr")
for tr in trs:
    season = tr.find_all("a")[0].text
    first_place = tr.find_all("a")[1].text
    runner_up = tr.find_all("a")[2].text
    third_place = tr.find_all("a")[3].text
    top_scorer = tr.find_all("a")[4].text
    seasons.append(season)
    first_places.append(first_place)
    runner_ups.append(runner_up)
    third_places.append(third_place)
    top_scorrers.append(top_scorer)

tuples=list(zip(seasons,first_places,runner_ups,third_places,top_scorrers))
df=pd.DataFrame(tuples,columns=["Season","FirstPlace","RunnerUp","ThirdPlace","TopScorrer"])
df

Is there an easier way to append data directly to an empty dataframe without creating lists and then zipping them?

Solution

While still using pandas "simplest" way to create your DataFrame is going with pandas.read_html():

import pandas as pd

df = pd.read_html('https://en.wikipedia.org/wiki/List_of_German_football_champions')[7]

To simply rename the columns and get rid of the [7]:

df.columns = ['Season', 'Champions', 'Runners-up', 'Third place',
   'Top scorer(s)', 'Goals']

Output:

	Season	Champions	Runners-up	Third place	Top scorer(s)	Goals
0	1963–64	1. FC Köln (2)	Meidericher SV	Eintracht Frankfurt	Uwe Seeler	30
1	1964–65	Werder Bremen (1)	1. FC Köln	Borussia Dortmund	Rudi Brunnenmeier	24
2	1965–66	TSV 1860 Munich (1)	Borussia Dortmund	Bayern Munich	Friedhelm Konietzka	26
3	1966–67	Eintracht Braunschweig (1)	TSV 1860 Munich	Borussia Dortmund	Lothar Emmerich, Gerd Müller	28
4	1967–68	1. FC Nürnberg (9)	Werder Bremen	Borussia Mönchengladbach	Hannes Löhr	27

...

An alternativ to avoid all these lists, get cleaner in process and using BeautifulSoup directly is to create more structured data - A single list of dicts:

data = []

for tr in soup.select('table:nth-of-type(8) tr:not(:has(th))'):
    data.append({
        'season':tr.find_all("a")[0].text,
        'first_place': tr.find_all("a")[1].text,
        'runner_up': tr.find_all("a")[2].text,
        'third_place': tr.find_all("a")[3].text,
        'top_scorer': tr.find_all("a")[4].text,
    })

pd.DataFrame(data)

Example

import pandas as pd
from bs4 import BeautifulSoup
import requests

url="https://en.wikipedia.org/wiki/List_of_German_football_champions"
page=requests.get(url).content
soup=BeautifulSoup(page,"html.parser")

data = []

for tr in soup.select('table:nth-of-type(8) tr:not(:has(th))'):
    data.append({
        'season':tr.find_all("a")[0].text,
        'first_place': tr.find_all("a")[1].text,
        'runner_up': tr.find_all("a")[2].text,
        'third_place': tr.find_all("a")[3].text,
        'top_scorer': tr.find_all("a")[4].text,
    })

pd.DataFrame(data)

Answered By - HedgeHog

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, August 25, 2022

[FIXED] What is the easiest way to append data to Pandas DataFrame?

Issue

Solution

Example

0 comments:

Post a Comment

Popular Posts

Labels