Sunday, October 16, 2022

[FIXED] scraping a table from multiple table wikipedia

October 16, 2022 beautifulsoup, html, python, web-scraping No comments

Issue

I was trying to scrape table from this url wikipedia. There are 5 different tables there. But my target is the first table shown there. It has not much identity there, that table only contains this identity

class="wikitable sortable jquery-tablesorter"

which the other table have the same identity. I saw some source that i should take it by id. But this table has no id.

This

My_table = soup.find('table',{'class':'wikitable sortable'})

this is how i scrape it currently

Question

How do we choose only that table without id

Solution

You can select the first table using soup.find_all('table')[1]

from bs4 import BeautifulSoup
import requests 

url = "https://id.wikipedia.org/wiki/Demografi_Indonesia#Jumlah_penduduk_menurut_provinsi"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
table = soup.find_all('table')[1]
rows = table.find_all('tr')
row_list = list()

for tr in rows:
    td = tr.find_all('td')
    row = [i.text for i in td]
    row_list.append(row)

print(row_list[1:])

Answered By - jwjhdev

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, October 16, 2022

[FIXED] scraping a table from multiple table wikipedia

Issue

This

Question

Solution

0 comments:

Post a Comment

Popular Posts

Labels