Issue
I have obtained html page source code and then parsed it using 'html5lib' with BeautifulSoup.
I have got something like this:
<div class="V0h1Ob-haAclf OPZbO-KE6vqe o0s21d-HiaYvf" jsaction="mouseover:pane.wfvdle40;mouseout:pane.wfvdle40" jsan="7.V0h1Ob-haAclf,7.OPZbO-KE6vqe,7.o0s21d-HiaYvf,0.jsaction" jstcache="824">
<a aria-label="Muzeum Londynu" class="a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd" href="https://www.google.com/maps/place/Muzeum+Londynu/data=!4m5!3m4!1s0x48761b5508c1cbeb:0x407de2c1952a25e4!8m2!3d51.5176183!4d-0.0967782?authuser=0&hl=pl&rclk=1" jsaction="pane.wfvdle40;focus:pane.wfvdle40;blur:pane.wfvdle40;auxclick:pane.wfvdle40;contextmenu:pane.wfvdle40;keydown:pane.wfvdle40;clickmod:pane.wfvdle40" jsan="7.a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd,0.aria-label,8.href,0.jsaction" jstcache="825"></a>
<div class="CJY91c-jRmmHf-aVTXAb-haAclf-WFkMr" jstcache="826"></div>
<div aria-label="Muzeum Londynu" class="MVVflb-haAclf V0h1Ob-haAclf-d6wfac MVVflb-haAclf-uxVfW-hSRGPd" jsan="7.MVVflb-haAclf,7.V0h1Ob-haAclf-d6wfac,7.MVVflb-haAclf-uxVfW-hSRGPd,0.aria-label" jstcache="827">
<div class="CJY91c-jRmmHf-aVTXAb-haAclf-bIWrp" jstcache="828"></div>
<div class="lI9IFe">
<div class="CJY91c-jRmmHf-aVTXAb-haAclf-HSrbLb" jstcache="829">
<div class="RnEfrd-jRmmHf-HSrbLb B9Hcub-QFlW2" jsan="t-pdDsP4P8DQQ,7.RnEfrd-jRmmHf-HSrbLb,7.B9Hcub-QFlW2" jstcache="933">
<button jstcache="842" style="display:none"></button>
<div class="Z8fK3b" jsan="7.Z8fK3b,t-MjeqqY5XOdM" jstcache="843">
<div class="CUwbzc-content gm2-body-2"> <div class="qBF1Pd-haAclf">
<div class="qBF1Pd gm2-subtitle-alt-1" jsan="7.qBF1Pd,7.gm2-subtitle-alt-1,t-u3p6PfXaXm4" jstcache="845">
<span jstcache="858">Muzeum Londynu</span>
</div>
<h1 jstcache="846" style="display:none"></h1>
<span class="RnEfrd-jRmmHf-HSrbLb-title-Btuy5e-haAclf"></span>
</div>
<div class="section-subtitle-extension" jstcache="847"></div>
<div class="ZY2y6b-RWgCYc" jsan="7.ZY2y6b-RWgCYc,t-hEqDOx2FFV0" jstcache="848">
<div class="OEvfgc-wcwwM-haAclf">
<span class="RnEfrd-jRmmHf-HSrbLb-wPzPJb-Btuy5e-haAclf" jstcache="860"></span>
<span class="gm2-body-2" jsan="t-CJ3Gw1VPbAA,7.gm2-body-2" jstcache="861">
<span jstcache="868" style="display:none"></span>
<span aria-label=" 4,6-gwiazdkowy Opinie (13 898) " class="ZkP5Je" jsan="7.ZkP5Je,0.aria-label,0.role,t-kqtGnPs-9G0" jstcache="869" role="group">
<span aria-hidden="true" class="MW4etd" jsan="7.MW4etd,0.aria-hidden" jstcache="872">4,6</span>
<div jstcache="873" style="display:none"></div>
<div class="QBUL8c" jsan="7.QBUL8c" jsinstance="0" jstcache="874"></div>
<div class="QBUL8c" jsan="7.QBUL8c" jsinstance="1" jstcache="874"></div>
<div class="QBUL8c" jsan="7.QBUL8c" jsinstance="2" jstcache="874"></div>
<div class="QBUL8c" jsan="7.QBUL8c" jsinstance="3" jstcache="874"></div>
<div class="QBUL8c cXOKEb-S62Q7b" jsan="7.QBUL8c,7.cXOKEb-S62Q7b" jsinstance="*4" jstcache="874"></div>
<span aria-hidden="true" class="UY7F9" jsan="7.UY7F9,0.aria-hidden" jstcache="875">(13 898)</span>
</span>
</span>
<span jstcache="862" style="display:none">
<jsl jstcache="863" style="display:none"></jsl>
</span>
</div>
</div>
<div class="ZY2y6b-RWgCYc">
<span jstcache="849" style="display:none"></span>
<div class="ZY2y6b-RWgCYc" jsinstance="0" jstcache="850">
<span jsinstance="0" jstcache="851">
<jsl jstcache="852"> <span jstcache="884" style="display:none">·</span>
<span jstcache="885">Muzeum</span> <span jstcache="886" style="display:none"></span> </jsl> </span><span jsinstance="*1" jstcache="851"> <jsl jstcache="852"> <span aria-hidden="true" class="bXlT7b-hgDUwe" jsan="7.bXlT7b-hgDUwe,0.aria-hidden" jstcache="884">·</span> <span jstcache="885">150 London Wall</span> <span jstcache="886" style="display:none"></span> </jsl> </span> </div><div class="ZY2y6b-RWgCYc" jsinstance="1" jstcache="850"> <span jsinstance="*0" jstcache="851"> <jsl jstcache="852"> <span jstcache="884" style="display:none">·</span> <span jstcache="885">Historia Londynu od starożytności do dziś</span> <span jstcache="886" style="display:none"></span> </jsl> </span> </div><div class="ZY2y6b-RWgCYc" jsinstance="*2" jstcache="850"> <span jsinstance="*0" jstcache="851"> <jsl jstcache="852"> <span jstcache="884" style="display:none">·</span> <span jstcache="885">Zamknięcie: 17:00</span> <span jstcache="886" style="display:none"></span> </jsl> </span> </div> </div> </div> </div></div></div><div class="CJY91c-jRmmHf-aVTXAb-haAclf-JIbuQc" jstcache="830"></div><div class="CJY91c-jRmmHf-aVTXAb-haAclf-HiaYvf" jstcache="831"><div class="xwpmRb qisNDe" jsan="t-PLs0ILPSy_c,7.xwpmRb,7.qisNDe,5.width,5.height,5.margin-top,5.margin-bottom,5.margin-left,5.margin-right" jstcache="932" style="width: 84px; height: 84px; margin: 0px;"><div class="Vig8jf-haAclf p0Hhde" jsan="7.p0Hhde,7.Vig8jf-haAclf,5.min-width,5.min-height" jstcache="836" style="min-width:84px;min-height:84px"><img aria-hidden="true" decoding="async" src="https://blogger.googleusercontent.com/img/proxy/AVvXsEhWyIpEBRaP50FPDzOfWUSf0nqs_DZQIZeQVN3W6BPEoPfXJDfF4zqSIJnI224CEdIiMgqNDhfhfOWKH_IWxDVDTdw9JVdBzxO-q03XepTVTMFGawhsJRW0o7qAidf83DKl6GCg2EUmuPKY-AuiIWCCdcPdFOof0P9IN2hkIkXial9yPthJsk0O_v8iCrHRzjTqh7k07cAnRl0EMTKlJ3hcQXdjaTY=w138-h92-k-no" style="position: absolute; top: 50%;left: 50%;width: 126px;height: 84px;-webkit-transform: translateY(-50%) translateX(-50%);transform: translateY(-50%) translateX(-50%);"/></div><button jstcache="837" style="display:none"></button><div class="badge-container"></div></div></div><div class="CJY91c-jRmmHf-aVTXAb-haAclf-hxbzzc" jstcache="832"></div></div><div class="CJY91c-jRmmHf-aVTXAb-haAclf-IoWfhc" jstcache="833"></div></div></div>
The last part was running methong .find_all('a', href=True) which got me something like this:
[<a aria-label="Muzeum Londynu" class="a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd" href="https://www.google.com/maps/place/Muzeum+Londynu/data=!4m5!3m4!1s0x48761b5508c1cbeb:0x407de2c1952a25e4!8m2!3d51.5176183!4d-0.0967782?authuser=0&hl=pl&rclk=1" jsaction="pane.wfvdle40;focus:pane.wfvdle40;blur:pane.wfvdle40;auxclick:pane.wfvdle40;contextmenu:pane.wfvdle40;keydown:pane.wfvdle40;clickmod:pane.wfvdle40" jsan="7.a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd,0.aria-label,8.href,0.jsaction" jstcache="825"></a>]
I am trying to specifically extract longitude and latitude which are [51.5176183, -0.0967782] present in the href.
I've tried using .href method similar to .text method but when i am using .href 'None' is being returned. Could you tell me how to extract those two velues from href body?
Running .text method on the html code returning output like this:
Museum of London 4,6(13 898) · Museum · 150 London Wall · The history of London from antiquity to today · Closing: 17:00
Solution
According to your question, I use split() method to get the desired output.
script
html='''
<html>
<head>
</head>
<body>
<a aria-label="Muzeum Londynu" class="a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd" href="https://www.google.com/maps/place/Muzeum+Londynu/data=!4m5!3m4!1s0x48761b5508c1cbeb:0x407de2c1952a25e4!8m2!3d51.5176183!4d-0.0967782?authuser=0&hl=pl&rclk=1" jsaction="pane.wfvdle40;focus:pane.wfvdle40;blur:pane.wfvdle40;auxclick:pane.wfvdle40;contextmenu:pane.wfvdle40;keydown:pane.wfvdle40;clickmod:pane.wfvdle40" jsan="7.a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd,0.aria-label,8.href,0.jsaction" jstcache="825">
</a>
</body>
</html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html,'html5lib')
#print(soup.prettify())
href=soup.find("a",class_="a4gq8e-aVTXAb-haAclf-jRmmHf-hSRGPd").get('href')
lat_lan=','.join(href.split('/')[-1].split('?')[0].split(':')[-1].split('!')[2:]).replace('3d','').replace('4d','').split()
print(lat_lan)
Output
['51.5176183', '-0.0967782']
Answered By - Fazlul
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.