Issue
I'm triying to get the href attributes from a table in this web. I have this code to get all of the links but i want to filter so i only access to the href for 'Automaticas' not the 'Manuales'
# Fetch URL
url = 'http://meteo.navarra.es/estaciones/descargardatos.cfm'
request = urllib2.Request(url)
request.add_header('Accept-Encoding', 'utf-8')
# Response has UTF-8 charset header, and HTML body which is UTF-8 encoded
response = urllib2.urlopen(request)
# Parse with BeautifulSoup
soup = BeautifulSoup(response,'html.parser')
for a in soup.find_all('a',{'href': re.compile(r'descargardatos_estacion.*')}):
estacion = 'http://meteo.navarra.es/estaciones/' + a.attrs.get('href')
print(estacion)
# descarga_csvs(estacion)
The src above for 'Automaticas' and 'Manuales' are different but i don't know how to filter them.
Solution
By preference I would use css selectors for speed, and simply filter on img with src containing automatica
. Then move to the adjacent a
tag, with an adjacent sibling combinator (+), and extract the href
.
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('http://meteo.navarra.es/estaciones/descargardatos.cfm')
soup = bs(r.content, 'lxml')
automaticas = ['http://meteo.navarra.es/estaciones/' + i['href'] for i in soup.select('img[src*=automatica] + a')]
Answered By - QHarr
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.