Issue
I'm learning python and was trying to automate a process which involves going to a site : wildcad net and clicking every single dispatch center, from there loading a kml. I noticed that each page follows a similar format,
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">
<head>
<meta content="text/html;charset=utf-8" http-equiv="Content-Type"/>
<meta content="WildCAD (Brian Booher)" name="GENERATOR"/>
<title>
WCAZ-ADC
</title>
</head>
<frameset rows="64,*">
<frame name="banner" noresize="" scrolling="no" src="WCAZ-ADCtop.htm"/>
<frameset cols="150,*">
<frame name="contents" src="WCAZ-ADCleft.htm"/>
<frame name="main" src="WCAZ-ADCright.htm"/>
</frameset>
<noframes>
<body>
<p>
<a href="http://www.wildcadmap.net/WildCAD_AZ-FDC.kml" target="map"><font size="1">Incident Map (Google Earth)</font></a>
</p>
</body>
</noframes>
</frameset>
I figured I could use BeautifulSoup and select all. I put in each dispatch center and then selected the search for the 'a' and 'href' since these shouldn't change. The code I wrote is this. However it doesn't seem to identify KML as it's own variable. I'm not quite sure where I went wrong, I'm a bit lost in troubleshooting the next steps. Any pointing in the right direction would be of great help!
from bs4 import BeautifulSoup
import requests
urls = ('http://www.wildcad.net/WCAZ-ADC.htm', 'http://www.wildcad.net/WCALAIC.htm',
'http://www.wildcad.net/WCAR-AOC.htm','http://www.wildcad.net/WCAZ-ADC.htm'
'http://www.wildcad.net/WCAZ-FDC.htm', 'http://www.wildcad.net/WCAZ-PDC.htm'
'http://www.wildcad.net/WCAZ-PHC.htm', 'http://www.wildcad.net/WCAZ-SDC.htm'
'http://www.wildcad.net/WCAZ-TDC.htm', 'http://www.wildcad.net/WCAZ-WDC.htm'
'http://www.wildcad.net/WCBLMNOC.htm', 'http://www.wildcad.net/WCCA-ANF.htm'
'http://www.wildcad.net/WCCA-ANF.htm', 'http://www.wildcad.net/WCCA-CNF.htm'
'http://www.wildcad.net/WCCA-FICC.htm', 'http://www.wildcad.net/WCCA-GVCC.htm'
'http://www.wildcad.net/WCCA-MICC.htm', 'http://www.wildcad.net/WCCA-ONCC.htm'
'http://www.wildcad.net/WCCA-OVICC.htm', 'http://www.wildcad.net/WCCA-PNF.htm'
'http://www.wildcad.net/WCCA-SNF.htm' , 'http://www.wildcad.net/WCCA-STF.htm'
'http://www.wildcad.net/WCCA-YICC.htm', 'http://www.wildcad.net/WCCA-YNP.htm'
'http://www.wildcad.net/WCCALPF.htm' , 'http://www.wildcad.net/WCCAMNF.htm'
'http://www.wildcad.net/WCCANCIC.htm' , 'http://www.wildcad.net/WCCARICC.htm'
'http://www.wildcad.net/WCCASQCC.htm', 'http://www.wildcad.net/WCCCICC.htm'
'http://www.wildcad.net/WCCO-CRC.htm' , 'http://www.wildcad.net/WCCO-FTC.htm'
'http://www.wildcad.net/WCCO-GJC.htm' , 'http://www.wildcad.net/WCCO-MTC.htm'
'http://www.wildcad.net/WCCODRC.htm' , 'http://www.wildcad.net/WCCOPBC.htm'
'http://www.wildcad.net/WCFL-FIC.htm' , 'http://www.wildcad.net/WCGAGIC.htm'
'http://www.wildcad.net/WCID-CDC.htm' , 'http://www.wildcad.net/WCID-GVC.htm'
'http://www.wildcad.net/WCID-SCC.htm', 'http://www.wildcad.net/WCIDBDC.htm'
'http://www.wildcad.net/WCIDCIC.htm', 'http://www.wildcad.net/WCIDEIC.htm'
'http://www.wildcad.net/WCIDPAC.htm' , 'http://www.wildcad.net/WCILILC.htm'
'http://www.wildcad.net/WCIN-IIC.htm', 'http://www.wildcad.net/WCKY-KIC.htm'
'http://www.wildcad.net/WCLALIC.htm', 'http://www.wildcad.net/WCMI-MIDC.htm'
'http://www.wildcad.net/WCMN-MNCC.htm', 'http://www.wildcad.net/WCMOMOC.htm'
'http://www.wildcad.net/WCMSMIC.htm', 'http://www.wildcad.net/WCMT-BRC.htm'
'http://www.wildcad.net/WCMT-BZC.htm', 'http://www.wildcad.net/WCMT-DDC.htm'
'http://www.wildcad.net/WCMT-GDC.htm' 'http://www.wildcad.net/WCMT-HDC.htm'
'http://www.wildcad.net/WCMT-KDC.htm', 'http://www.wildcad.net/WCMT-KIC.htm'
'http://www.wildcad.net/WCMT-LEC.htm' , 'http://www.wildcad.net/WCMT-MCC.htm'
'http://www.wildcad.net/WCMT-MDC.htm', 'http://www.wildcad.net/WCNC-NCC.htm'
'http://www.wildcad.net/WCNDNDC.htm' , 'http://www.wildcad.net/WCNH-NEC.htm'
'http://www.wildcad.net/WCNM-ABC.htm' , 'http://www.wildcad.net/WCNM-ADC.htm'
'http://www.wildcad.net/WCNM-SDC.htm', 'http://www.wildcad.net/?WildWeb=NM-SFC'
'http://www.wildcad.net/WCNMTDC.htm', 'http://www.wildcad.net/WCNMTDC.htm'
'http://www.wildcad.net/WCNVCNC.htm' , 'http://www.wildcad.net/WCNVECC.htm'
'http://www.wildcad.net/WCNVEIC.htm' , 'http://www.wildcad.net/WCNVLIC.htm'
'http://www.wildcad.net/WCNVSFC.htm', 'http://www.wildcad.net/WCOR-BIC.htm'
'http://www.wildcad.net/WCOR-COC.htm', 'http://www.wildcad.net/WCOR-EIC.htm'
'http://www.wildcad.net/WCOR-JDCC.htm', 'http://www.wildcad.net/WCOR-RICC.htm'
'http://www.wildcad.net/WCOR-RVC.htm', 'http://www.wildcad.net/WCOR-VAC.htm'
'http://www.wildcad.net/WCORBMC.htm', 'http://www.wildcad.net/WCORLFC.htm'
'http://www.wildcad.net/WCPA-MACC.htm', 'http://www.wildcad.net/WCSC-SCC.htm'
'http://www.wildcad.net/WCSC-SRF.htm', 'http://www.wildcad.net/WCSD-GPC.htm'
'http://www.wildcad.net/WCTN-TNC.htm', 'http://www.wildcad.net/WCTXTIC.htm'
'http://www.wildcad.net/WCUT-CDC.htm' , 'http://www.wildcad.net/WCUT-MFC.htm'
'http://www.wildcad.net/WCUT-NUC.htm' , 'http://www.wildcad.net/WCUT-RFC.htm'
'http://www.wildcad.net/WCUT-UBC.htm' , 'http://www.wildcad.net/WCVAVIC.htm'
'http://www.wildcad.net/WCWA-CWC.htm', 'http://www.wildcad.net/WCWY-CDC.htm'
'http://www.wildcad.net/WCWY-CPC.htm',
result = requests.get(urls)
doc = BeautifulSoup(result.text, 'html.parser')
print(doc.prettify())
for i in enumerate(soup.findAll('a')):
_KML = urls + link.get('href')
if _KML.endswith('.kml'):
urls.append(_KML)
open(_KML)
Solution
If I understood the question,then this is the next working example
doc='''
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">
<head>
<meta content="text/html;charset=utf-8" http-equiv="Content-Type"/>
<meta content="WildCAD (Brian Booher)" name="GENERATOR"/>
<title>
WCAZ-ADC
</title>
</head>
<frameset rows="64,*">
<frame name="banner" noresize="" scrolling="no" src="WCAZ-ADCtop.htm"/>
<frameset cols="150,*">
<frame name="contents" src="WCAZ-ADCleft.htm"/>
<frame name="main" src="WCAZ-ADCright.htm"/>
</frameset>
<noframes>
<body>
<p>
<a href="http://www.wildcadmap.net/WildCAD_AZ-FDC.kml" target="map"><font size="1">Incident Map (Google Earth)</font></a>
<a href="http://www.wildcadmap.net/WildCAD_AZ-FDC.htm" target="map"><font size="1">Incident Map (Google Earth)</font></a>
</p>
</body>
</noframes>
</frameset>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(doc, 'html.parser')
#print(doc.prettify())
for i in soup.find_all('a'):
#print(i.get('href'))
urls = i.get('href')
if urls.endswith('.kml'):
kml = urls
print(kml)
Output:
http://www.wildcadmap.net/WildCAD_AZ-FDC.kml
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.