Saturday, March 5, 2022

[FIXED] Downloading video from website using python

March 05, 2022 beautifulsoup, python, python-requests, web-scraping No comments

Issue

I am trying to download a video from the website, but I am unable to download the video specifically. I am unable to download the video. I don't see any error but I also don' see a downloaded video. I am not sure what is wrong with my code. Any help is much appreciated. Thank you in advance.

Code Below:

try:
    import urllib.request as urllib2
except ImportError:
    import urllib2

dwn_link = 'https://www1.wdr.de/fernsehen/lokalzeit/ostwestfalen/videos/video-lokalzeit-owl---1304.html'

file_name = '_video.mp4' 
rsp = urllib2.urlopen(dwn_link)
with open(file_name,'wb') as f:
    f.write(rsp.read())

Solution

There is a Javascript object in the result that holds the mp4 and m3u8 link, it's formatted like this:

<script type="text/javascript">
globalObject.gseaInlineMediaData["something"] =
{
    "mediaVersion": "1.4.0",
    "mediaType": "vod",
    "mediaResource": {
        "dflt": {
            "videoURL": "//some_file.m3u8",
            "mediaFormat": "hls"
        },
        "alt": {
            "videoURL": "//some_file.mp4", <======== HERE
            "mediaFormat": "mp4"
        },
        "previewImage": "//some_file.jpg"
    },
    ....
};
</script>

You can then directly grab the mp4 file like this:

import requests
import re
import json

r = requests.get("https://www1.wdr.de/fernsehen/lokalzeit/ostwestfalen/videos/video-pausenbrot-prozess-mit-gutachten-zum-angeklagten-100.html")
res = re.search(r"globalObject\.gseaInlineMediaData.*\s*=\s*(.*)\s*;\s*<\/script>", r.text, re.DOTALL)
data = json.loads(res.group(1))
video_url = f'https:{data["mediaResource"]["alt"]["videoURL"]}'

print(video_url)

r = requests.get(video_url, stream = True) 

with open("video.mp4", 'wb') as f: 
    for chunk in r.iter_content(chunk_size = 1024*1024): 
        if chunk: 
            f.write(chunk)

If you have a video link where the mp4 is not available in the alt object, you could use ffmpeg to get the file from the m3u8:

ffmpeg  -protocol_whitelist "file,http,https,tcp,tls" -i  something.m3u8 file.mp4

Answered By - Bertrand Martel

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, March 5, 2022

[FIXED] Downloading video from website using python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels