Issue
I have been trying to pull datatables from this website and cannot seem to get the table: https://www.wunderground.com/history/daily/us/nv/north-las-vegas/KVGT/date/2021-8-26
I first tried calling pd.read_html(url), where the url variable is the link above. This returns no tables error.
I then tried to access the website using urllib3 and parsing with bs4, like so:
import urllib3
from bs4 import BeautifulSoup
url = 'https://www.wunderground.com/history/daily/us/nv/north-las-vegas/KVGT/date/2021-8-26'
http = urllib3.PoolManager()
r = http.request('GET', url)
soup = BeautifulSoup(r.data)
list_of_tables = soup.find_all('table')
where list_of_tables returns an empty list. Can anyone help me retrieve the table with all the hourly weather data as I am not sure where to go from here.
Solution
Information in that page is loaded dynamically, from an API. You can inspect the Network tab in Dev tools, to inspect the network calls. One way of getting a table from that page would be:
import requests
import pandas as pd
r = requests.get('https://api.weather.com/v1/location/KVGT:9:US/observations/historical.json?apiKey=e1f10a1e78da46f5b10a1e78da96f525&units=e&startDate=20210826&endDate=20210826')
df = pd.DataFrame(r.json()['observations'])
df
This returns a dataframe with historical data:
key class expire_time_gmt obs_id obs_name valid_time_gmt day_ind temp wx_icon icon_extd wx_phrase pressure_tend pressure_desc dewPt heat_index rh pressure vis wc wdir wdir_cardinal gust wspd max_temp min_temp precip_total precip_hrly snow_hrly uv_desc feels_like uv_index qualifier qualifier_svrty blunt_phrase terse_phrase clds water_temp primary_wave_period primary_wave_height primary_swell_period primary_swell_height primary_swell_direction secondary_swell_period secondary_swell_height secondary_swell_direction
0 KVGT observation 1629971580 KVGT Las Vegas 1629964380 N 93 33 3300 Fair NaN None 40 89 16 27.56 10 93 190.0 S NaN 12 107.0 76.0 None 0 None Low 89 0 None None None None CLR None None None None None None None None None
1 KVGT observation 1629975180 KVGT Las Vegas 1629967980 N 93 33 3300 Fair 2.0 Falling Rapidly 41 89 16 27.55 10 93 260.0 W NaN 8 NaN NaN None 0 None Low 89 0 None None None None CLR None None None None None None None None None
2 KVGT observation 1629978780 KVGT Las Vegas 1629971580 N 90 33 3300 Fair NaN None 43 86 19 27.55 10 90 210.0 SSW NaN 6 NaN NaN None 0 None Low 86 0 None None None None CLR None None None None None None None None None
3 KVGT observation 1629982380 KVGT Las Vegas 1629975180 N 86 33 3300 Fair NaN None 41 83 20 27.56 10 86 310.0 NW NaN 3 NaN NaN None 0 None Low 83 0 None None None None CLR None None None None None None None None None
[....]
For daily observations data, the url you would need to scrape is https://api.weather.com/v1/location/KVGT:9:US/almanac/daily.json?apiKey=e1f10a1e78da46f5b10a1e78da96f525&units=e&start=0826
You can install requests with pip install requests
, and pandas with pip install pandas
Answered By - platipus_on_fire
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.