Issue
I am trying to download a video from the website, but I am unable to download the video specifically. I am unable to download the video. I don't see any error but I also don' see a downloaded video. I am not sure what is wrong with my code. Any help is much appreciated. Thank you in advance.
Code Below:
try:
import urllib.request as urllib2
except ImportError:
import urllib2
dwn_link = 'https://www1.wdr.de/fernsehen/lokalzeit/ostwestfalen/videos/video-lokalzeit-owl---1304.html'
file_name = '_video.mp4'
rsp = urllib2.urlopen(dwn_link)
with open(file_name,'wb') as f:
f.write(rsp.read())
Solution
There is a Javascript object in the result that holds the mp4 and m3u8 link, it's formatted like this:
<script type="text/javascript">
globalObject.gseaInlineMediaData["something"] =
{
"mediaVersion": "1.4.0",
"mediaType": "vod",
"mediaResource": {
"dflt": {
"videoURL": "//some_file.m3u8",
"mediaFormat": "hls"
},
"alt": {
"videoURL": "//some_file.mp4", <======== HERE
"mediaFormat": "mp4"
},
"previewImage": "//some_file.jpg"
},
....
};
</script>
You can then directly grab the mp4 file like this:
import requests
import re
import json
r = requests.get("https://www1.wdr.de/fernsehen/lokalzeit/ostwestfalen/videos/video-pausenbrot-prozess-mit-gutachten-zum-angeklagten-100.html")
res = re.search(r"globalObject\.gseaInlineMediaData.*\s*=\s*(.*)\s*;\s*<\/script>", r.text, re.DOTALL)
data = json.loads(res.group(1))
video_url = f'https:{data["mediaResource"]["alt"]["videoURL"]}'
print(video_url)
r = requests.get(video_url, stream = True)
with open("video.mp4", 'wb') as f:
for chunk in r.iter_content(chunk_size = 1024*1024):
if chunk:
f.write(chunk)
If you have a video link where the mp4 is not available in the alt
object, you could use ffmpeg to get the file from the m3u8:
ffmpeg -protocol_whitelist "file,http,https,tcp,tls" -i something.m3u8 file.mp4
Answered By - Bertrand Martel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.