Issue
Can we get the IMDB link from IMDB image of JustWatch website https://www.justwatch.com/in/movie/oppenheimer?
When I inspected the image elements of IMDB, there were no IMDB links.
However, when I clicked on it, it can open the IMDB link https://www.imdb.com/title/tt15398776/?ref_=justwatch.
Is there any way of scraping the link that doesn't show up in inspect view by using python?
Thank you in advance.
This is my code which can get only rating
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
url = "https://www.justwatch.com/in/movie/oppenheimer"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, 'html.parser')
soup.select('div.jw-scoring-listing__rating span span')[1]
Solution
urllib
or requests
do not work like a browser and so do not handle JavaScript or render things dynamically, but there is a way if the information is included in the static response to extract it.
You could try to check the content of script elements with regex
for the external imdbId
:
from urllib.request import Request, urlopen
import re
match = re.search(r"\"imdbId\":\s*\"([^\"]+)\"", str(webpage))
if match:
imdb_id_value = match.group(1)
print(f'https://www.imdb.com/title/{imdb_id_value}/?ref_=justwatch')
else:
print('no imdbId found')
That ends up in the following link if imdbId
was found:
https://www.imdb.com/title/tt15398776/?ref_=justwatch
or in alterntive convert the content into JSON and treat it like a dict
:
...
json.loads(soup.select_one('script:-soup-contains("APOLLO_STATE")').text.strip('window.__APOLLO_STATE__='))
...
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.