Issue
I am currently trying to read out the locations of a company. The information about the locations is inside a script tag (json). So I read out the contet inside the corresponding script tag.
This is my code:
sauce = requests.get('https://www.ep.de/store-finder', verify=False, headers = {'User-Agent':'Mozilla/5.0'})
soup1 = BeautifulSoup(sauce.text, features="html.parser")
all_scripts = soup1.find_all('script')[6]
all_scripts.contents
The output is:
['\n\t\twindow.storeFinderComponent = {"center":{"lat":51.165691,"long":10.451526},"bounds":[[55.655085,5.160441],[46.439648,15.666775]],"stores":[{"code":"1238240","lat":51.411572,"long":10.425264,"name":"EP:Schulze","url":"/schulze-breitenworbis","showAsClosed":false,"isBusinessCard":false,"logoUrl":"https://cdn.prod.team-ec.com/logo/retailer/retailerlogo_epde_1238240.png","address":{"street":"Weststraße 6","zip":"37339","town":"Breitenworbis","phone":"+49 (36074) 31193"},"email":"[email protected]","openingHours":[{"day":"Mo.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},{"day":"Di.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},{"day":"Mi.","openingTime":"09:00","closingTime":"18:00","startPauseTime":"13:00","endPauseTime":"14:30"},...]
I have problems converting the content to a dictionary and reading all lat and long data.
When I try:
data = json.loads(all_scripts.get_text())
all_scripts.get_text() returns an empty list
So i tryed:
data = json.loads(all_scripts.contents)
But then i get an TypeError: the JSON object must be str, bytes or bytearray, not list
I dont know ho to convert the .content method to json:
data = json.loads(str(all_scripts.contents))
JSONDecodeError: Expecting value: line 1 column 2 (char 1)
Can anyone help me?
Solution
You could use regex to pull out the json and read that in.
import requests
import re
import json
html = requests.get('https://www.ep.de/store-finder', verify=False, headers = {'User-Agent':'Mozilla/5.0'}).text
pattern = re.compile('window\.storeFinderComponent = ({.*})')
result = pattern.search(html).groups(1)[0]
jsonData = json.loads(result)
Answered By - chitown88
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.