Issue
I need some help, for a project I should have to parse information from a real estate website.
Somehow I am able to parse almost, everything, but it has a oneliner, which I've never seen before.
The code itself is too large, but some example snippet:
<div class="d-none" data-listing='{"strippedPhotos":[{"caption":"","description":"","urls":{"1920x1080":"https:\/\/ot.ingatlancdn.com\/d6\/07\/32844921_216401477_hd.jpg","800x600":"https:\/\/ot.ingatlancdn.com\/d6\/07\/32844921_216401477_l.jpg","228x171":"https:\/\/ot.ingatlancdn.com\/d6\/07\/32844921_216401477_m.jpg","80x60":"https:\/\/ot.ingatlancdn.com\/d6\/07
Can you please help me to identify this, and maybe a solution to how to parse all the contained info into a pandas
DF?
Edit, code added:
other = []
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
hdr = {'User-Agent': 'Mozilla/5.0'}
site= "https://ingatlan.com/xiii-ker/elado+lakas/tegla-epitesu-lakas/32844921"
req = Request(site,headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page)
data = soup.find_all('div', id="listing", class_="d-none", attrs="data-listing")
data
Solution
You could access the value of the attribute and convert the string via json.loads()
:
data = json.loads(soup.find('div', id="listing", class_="d-none", attrs="data-listing").get('data-listing'))
Then simply create your DataFrame
via pandas.json_normalize()
:
pd.json_normalize(data['strippedPhotos'])
Example
Cause expected result is not clear, this just should point in a direction:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import pandas as pd
import json
hdr = {'User-Agent': 'Mozilla/5.0'}
site= "https://ingatlan.com/xiii-ker/elado+lakas/tegla-epitesu-lakas/32844921"
req = Request(site,headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page)
data = json.loads(soup.find('div', id="listing", class_="d-none", attrs="data-listing").get('data-listing'))
### all data
pd.json_normalize(data)
### only strippedPhotos
pd.json_normalize(data['strippedPhotos'])
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.