Issue
I get this code and I need to extract latitude and longitude from script tag.
<script>
var loadPoints = '/Map/Points';
var mapDetails = {"point":{"latitude":-34.023418,"longitude":18.331407,"title":"Sandy Bay","location":null,"subject":"P","link":"/Explore/South-Africa/Western-Cape/Sandy-Bay"},"bounds":null,"moveMarkerCallback":null,"changeBoundsCallback":null};
requireExploreMap(loadPoints, mapDetails);
</script>
I can see all HTML content in soup but when I try this way:
def get_textchunk(word1, word2, text):
if not (word1 in text and word2 in text): return ''
return text.split(word1)[-1].split(word2)[0]
lat = get_textchunk('latitude":', ',"longitude', soup.get_text(' '))
it doesn't return anything.
How can I fix it?
UPDATE
This is my code
with open('urls.txt', 'r' ,encoding="utf-8") as inf:
with open('data2.csv' , 'w' ,encoding="utf-8") as outf:
outf.write('Titre,add,art,club,tel,\n')
for row in inf:
url = row.strip()
response = requests.get(url)
if response.ok:
print ("ok")
soup = BeautifulSoup(response.text, 'html.parser')
print (soup)
stag = soup.find("script")
obj = json.loads(re.search(r"mapDetails\s*= \s*({.*});", str(stag)).group(1))
lat, lon = obj["point"]["latitude"], obj["point"]["longitude"]
#Faire une pause
time.sleep(2)
The problem is BS find the first script tag and the information needed are not in the first tag. Thanks a lot for your help
The page i try to scrap : https://worldbeachlist.com/Explore/Australia/Victoria/Bells-Beach
Solution
Try this:
import json
import re
from bs4 import BeautifulSoup
sample_script = """
<script>
var loadPoints = '/Map/Points';
var mapDetails = {"point":{"latitude":-34.023418,"longitude":18.331407,"title":"Sandy Bay","location":null,"subject":"P","link":"/Explore/South-Africa/Western-Cape/Sandy-Bay"},"bounds":null,"moveMarkerCallback":null,"changeBoundsCallback":null};
requireExploreMap(loadPoints, mapDetails);
</script>
"""
soup = BeautifulSoup(sample_script, 'html.parser').find('script').string
data = json.loads(re.search(r"mapDetails = (.+?);", soup).group(1))
print(json.dumps(data, indent=4))
# Access the keys
print(data['point']['latitude'])
print(data['point']['longitude'])
Output:
{
"point": {
"latitude": -34.023418,
"longitude": 18.331407,
"title": "Sandy Bay",
"location": null,
"subject": "P",
"link": "/Explore/South-Africa/Western-Cape/Sandy-Bay"
},
"bounds": null,
"moveMarkerCallback": null,
"changeBoundsCallback": null
}
-34.023418
18.331407
Answered By - baduker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.