Issue
After:
soup.select('tr:nth-child(1)')
I got:
[<tr>
<th bgcolor="#5ac05a" colspan="2">Date</th>
<th bgcolor="#a3c35a">T<br/>(C)</th>
<th bgcolor="#c0a35a">Td<br/>(C)</th>
<th bgcolor="#a3c35a">Tmax<br/>(C)</th>
<th bgcolor="#a3c35a">Tmin<br/>(C)</th>
...
</tr>]
How I can take list of strings (Date, T, Td) without manually select each element, like soup.select('tr:nth-child(1) > th:nth-child(5)')[0].text
because this works very slow and I have different numbers of th's on different pages?
Solution
To get the table to the pandas dataframe, you can use this example:
import re
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.ogimet.com/cgi-bin/gsynres?ind=28698&lang=en&decoded=yes&ndays=31&ano=2021&mes=1&day=1"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
header = [
th.get_text(strip=True) for th in soup.thead.select("tr")[0].select("th")
]
all_data = []
for row in soup.thead.select("tr")[1:]:
tds = [td.get_text(strip=True) for td in row.select("td")[:-3]]
tds.insert(0, tds.pop(0) + " " + tds.pop(0))
for td in row.select("td")[-3:]:
img = td.select_one("img[onmouseover]")
if img:
tds.append(re.search(r"'([^']+)'", img["onmouseover"]).group(1))
else:
tds.append("-")
all_data.append(tds)
df = pd.DataFrame(all_data, columns=header)
print(df)
df.to_csv("data.csv", index=False)
Prints:
Date T(C) Td(C) Tmax(C) Tmin(C) ddd ffkmh Gustkmh P0hPa P seahPa PTnd Prec(mm) Nt Nh InsoD-1 Viskm Snow(cm) WW W1 W2
0 01/01/2021 06:00 -30.6 -33.7 ----- -31.1 NNW 7.2 ---- 1027.8 1045.5 +1.5 ---- 0 - --- 20.0 ---- Diamond dust (with or without fog) Snow, or rain and snow mixed Cloud covering more than 1/2 of the sky during...
1 01/01/2021 03:00 -30.7 -33.7 ----- -30.7 NNW 7.2 ---- 1026.2 1044.0 +1.0 Tr/12h 8 8 3.7 10.0 23 Diamond dust (with or without fog) Snow, or rain and snow mixed Cloud covering more than 1/2 of the sky throug...
2 01/01/2021 00:00 -30.1 -33.1 ----- ----- NNW 7.2 ---- 1025.3 1043.0 +0.6 ---- 8 0 --- 10.0 ---- Diamond dust (with or without fog) Snow, or rain and snow mixed Cloud covering more than 1/2 of the sky during...
3 12/31/2020 21:00 -30.5 -33.5 ----- ----- NNW 3.6 ---- 1024.7 1042.4 +0.6 ---- 0 - --- 10.0 ---- Diamond dust (with or without fog) Snow, or rain and snow mixed Cloud covering 1/2 or less of the sky througho...
...and so on
And saves data.csv
(screenshot from LibreOffice):
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.