Issue
I am an amateur when it comes to PYTHON. I have a page where I want to download one thing - Town from website. This is link: https://www.olx.pl/d/oferta/echosonda-raymarine-dragonfly-5pro-najtaniej-w-polsce-1-CID767-IDHur6N.html and Screen where is Town:
In browswer i have this sources html with town:
<p class="css-7xdcwc-Text eu5v0x0">Bydgoszcz, <span class="css-1c0ed4l"></span></p>
I wrote this code but unfortunately it does not get this information for me.
import requests
from bs4 import BeautifulSoup
link='https://www.olx.pl/d/oferta/echosonda-raymarine-dragonfly-5pro-najtaniej-w-polsce-1-CID767-IDHur6N.html'
page1 = requests.get(link).content
advertisement = BeautifulSoup(page1, "html.parser")
town = advertisement.find('p', {'class' : 'css-7xdcwc-Text eu5v0x0'}).text.strip()
print(town)
It looks like the City is being landed later. And BeautifulSoup does not download it.Could you please help me how can I get the city name ?? Thank you in advance for your help.
Solution
The data is stored inside <script>
tag, so BeautifulSoup doesn't see it. You can use re
/json
modules to parse it:
import re
import json
import requests
url = "https://www.olx.pl/d/oferta/echosonda-raymarine-dragonfly-5pro-najtaniej-w-polsce-1-CID767-IDHur6N.html"
html_doc = requests.get(url).text
data = re.search(r"window\.__PRERENDERED_STATE__= ({.*})", html_doc).group(1)
data = json.loads(data)
# uncomment to print all data:
# print(json.dumps(data, indent=4))
print("City:", data["ad"]["ad"]["location"]["cityName"])
print("Region:", data["ad"]["ad"]["location"]["regionName"])
Prints:
City: Bydgoszcz
Region: Kujawsko-pomorskie
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.