Issue
I need to scrape a table from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M and store this data in python dataframe. I have pulled the table but unable to pick the columns (Postcode, Borough, Neighbourhood)
My table looks like this:
<table class="wikitable sortable">
<tbody><tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>
<tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>
...
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = requests.get(url)
soup= BeautifulSoup(response.text, "html.parser")
table = soup.find('table', {'class': 'wikitable sortable'})
df = []
for row in table.find_all('tr'):
columns = row.find_all('td')
Postcode = row.columns[1].get_text()
Borough = row.columns[2].get_text()
Neighbourhood = row.column[3].get_text()
df.append([Postcode,Borough,Neighbourhood])
With the above code I am getting TypeError: 'NoneType' object is not subscriptable
I googled it and got to know that I cannot do Postcode = row.columns[1].get_text() because of inline propery of the function.
I tried something else too but got some "Index error message".
It's simple. I need to traverse the row and goes on picking the three columns for each row and store it in a list. But I am not able to write it in a code.
Expected output is
Postcode Borough Neighbourhood
M1A Not assigned Not assigned
M2A Not assigned Not assigned
M3A North York Parkwoods
Solution
If you want to scrape a table from web, you can use pandas library.
import pandas as pd
url = 'valid_url'
df = pd.read_html(url)
print(df[0].head())
Answered By - Siddhi Kiran Bajracharya
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.