Issue
I am trying to web scrap a webpage to download the data. The code is as follow:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url='https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release'
res=requests.get(url)
soup=BeautifulSoup(res.text,'html.parser')
table=soup.select_one('table:has(caption:-soup-contains("Residential property price indexes"))')
df=pd.read_html(str(table))[0]
header=[th.text for th in table.thead.select("th")]
print(*header,sep='\t')
for row in table.tbody.select("tr"):
tds=[td.text for td in row.select("td")]
print(*tds,'\t')
And I finally get the below result:
RPPI (a) HPI ADPI Sep Qtr 21 to Dec Qtr 21 Sep Qtr 21 to Dec Qtr 21 Sep Qtr 21 to Dec Qtr 21 % change % change % change
4.7 5.3 3.2
Sydney 4.1 4.5 3.3
Melbourne 3.9 4.2 3.1
Brisbane 9.6 10.8 3.9
Adelaide 6.8 7.7 3.1
Perth 2.9 2.9 2.3
Hobart 6.5 6.7 4.9
Darwin 1.5 1.3 2.1
Canberra 6.4 7.0 4.3
How can I edit the header into three rows and prettify the data like this:
I would like to correct the header but I have no idea how to correct it as they have the same tag in the website. Thank you.
Solution
Try:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release"
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")
table = soup.select_one(
'table:has(caption:-soup-contains("Residential property price indexes"))'
)
df = pd.read_html(str(table))[0]
df = df.rename(
columns={
"Unnamed: 0_level_0": "",
"Unnamed: 0_level_1": "",
"Unnamed: 0_level_2": "",
}
)
print(df)
Prints:
RPPI (a) HPI ADPI
Sep Qtr 21 to Dec Qtr 21 Sep Qtr 21 to Dec Qtr 21 Sep Qtr 21 to Dec Qtr 21
% change % change % change
0 Weighted average of eight capital cities 4.7 5.3 3.2
1 Sydney 4.1 4.5 3.3
2 Melbourne 3.9 4.2 3.1
3 Brisbane 9.6 10.8 3.9
4 Adelaide 6.8 7.7 3.1
5 Perth 2.9 2.9 2.3
6 Hobart 6.5 6.7 4.9
7 Darwin 1.5 1.3 2.1
8 Canberra 6.4 7.0 4.3
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.