Issue
I'm a beginner with python and was trying to automate a process which involves going to a site :http://www.wildcad.net/WildCADWeb.asp and clicking every single dispatch center, for example: http://www.wildcad.net/WCAZ-ADC.htm , from there loading a kml. I noticed that each page follows a similar format so I thought I could use a select all function. I wrote my code as...
from bs4 import BeautifulSoup
import requests
urls = ('http://www.wildcad.net/WCAZ-ADC.htm', 'http://www.wildcad.net/WCALAIC.htm',
'http://www.wildcad.net/WCAR-AOC.htm','http://www.wildcad.net/WCAZ-ADC.htm'
'http://www.wildcad.net/WCAZ-FDC.htm', 'http://www.wildcad.net/WCAZ-PDC.htm'
'http://www.wildcad.net/WCAZ-PHC.htm', 'http://www.wildcad.net/WCAZ-SDC.htm'
'http://www.wildcad.net/WCAZ-TDC.htm', 'http://www.wildcad.net/WCAZ-WDC.htm')
result = requests.get(urls)
doc = BeautifulSoup(result.text, 'html.parser')
print(doc.prettify())
for i in enumerate(soup.findAll('a')):
_KML = urls + link.get('href')
if _KML.endswith('.kml'):
urls.append(_KML)
open(_KML)
However it doesn't seem to pull the files and I keep getting an error message on line '65' Any direction or example of how to remedy this will be very much appreciated!
Solution
Working code. Please just run the code.
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
url = 'http://www.wildcad.net/WildCADWeb.asp'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
time.sleep(2)
driver.get(url)
time.sleep(5)
soup = BeautifulSoup(driver.page_source, "html.parser")
for link in soup.select('table[align="center"] tbody tr td a')[1:]:
url=link.get('href')
#print(url)
if url.endswith('.kml'):
kml_url = url
print(kml_url)
Output:
http://www.wildcad.net/WAearth.kml
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.