Issue
Here the text/javascript code I extracted.
And also, I want to extract values from 'video_id', 'video_url', 'video_alt_url' from these script!
"""{
video_id: '000101',
video_categories: 'Categorie01, Categorie02',
video_tags: 'Categorie01, Categorie02', license_code: '$603825119921245', rnd: '1647426812',
video_url:'https://www.example.com/get_file/5/bb6a5e180f5037a3f348fbdee96a8c6f681c4c0bab/107000/107389/107389.mp4/?br=709',
postfix: '.mp4',
video_url_text: '480p',
video_alt_url:'https://www.example.com/get_file/5/47601c7136bcbe38e6eb0b2cfa04dd9d917aa6263b/107000/107389/107389_720p.mp4/?br=1243',
video_alt_url_text: '720p',
video_alt_url_hd: '1',
preview_url: 'https://www.example.com/contents/videos_screenshots/107000/107389/preview.jpg',
preview_url1:'https://www.example.com/contents/videos_screenshots/107000/107389/preview.mp4.jpg',
preview_height1: '480',
preview_url2:'https://www.example.com/contents/videos_screenshots/107000/107389/preview_720p.mp4.jpg',
preview_height2: '720',
skin: 'youtube.css',
logo_position: '0,0',
logo_anchor: 'topleft',
hide_controlbar: '1',
hide_style: 'fade',
volume: '1',
related_src: 'https://www.example.com/related_videos_html/107389/', adv_pre_vast: 'https://twinrdsrv.com/preroll.engine?id=613eb379-62dd-49ef-8299-db2b5b2af4d7&zid=12861&cvs={ClientVideoSupport}&time={TimeOffset}&stdtime={StdTimeOffset}&abr={IsAdblockRequest}&pageurl={PageUrl}&tid={TrackingId}&res={Resolution}&bw={BrowserWidth}&bh={BrowserHeight}&kw={Keywords}&referrerUrl={ReferrerUrl}&pw={PlayerWidth}&ph={PlayerHeight}',
adv_pre_skip_duration: '5',
adv_pre_skip_text_time: 'Skip ad in %time',
adv_pre_skip_text: 'Skip ad',
adv_post_vast: 'https://twinrdsrv.com/preroll.engine?id=613eb379-62dd-49ef-8299-db2b5b2af4d7&zid=12861&cvs={ClientVideoSupport}&time={TimeOffset}&stdtime={StdTimeOffset}&abr={IsAdblockRequest}&pageurl={PageUrl}&tid={TrackingId}&res={Resolution}&bw={BrowserWidth}&bh={BrowserHeight}&kw={Keywords}&referrerUrl={ReferrerUrl}&pw={PlayerWidth}&ph={PlayerHeight}',
adv_post_skip_duration: '5',
adv_post_skip_text_time: 'Skip ad in %time',
adv_post_skip_text: 'Skip ad',
lrcv: '1651572296480833989009946',
vast_timeout1: '10',
player_width: '882',
player_height: '496.9014084507',
embed: '1'
}"""
Solution
Not sure why you haven't include the url or the part of the code that shows how you extract this (there might be an easier way to get this data).
Put what you can do is take the json literal object, manipulate it with regex to get it into a valid for to use ast.literal_eval()
. Not the most robust, but works:
js_obj = '''{
video_id: '000101',
video_categories: 'Categorie01, Categorie02',
video_tags: 'Categorie01, Categorie02', license_code: '$603825119921245', rnd: '1647426812',
video_url: 'https://www.example.com/get_file/5/bb6a5e180f5037a3f348fbdee96a8c6f681c4c0bab/107000/107389/107389.mp4/?br=709',
postfix: '.mp4',
video_url_text: '480p',
video_alt_url: 'https://www.example.com/get_file/5/47601c7136bcbe38e6eb0b2cfa04dd9d917aa6263b/107000/107389/107389_720p.mp4/?br=1243', video_alt_url_text: '720p', video_alt_url_hd: '1', preview_url: 'https://www.example.com/contents/videos_screenshots/107000/107389/preview.jpg',
preview_url1: 'https://www.example.com/contents/videos_screenshots/107000/107389/preview.mp4.jpg',
preview_height1: '480',
preview_url2: 'https://www.example.com/contents/videos_screenshots/107000/107389/preview_720p.mp4.jpg',
preview_height2: '720',
skin: 'youtube.css',
logo_position: '0,0',
logo_anchor: 'topleft',
hide_controlbar: '1', hide_style: 'fade', volume: '1',
related_src: 'https://www.example.com/related_videos_html/107389/', adv_pre_vast: 'https://twinrdsrv.com/preroll.engine?id=613eb379-62dd-49ef-8299-db2b5b2af4d7&zid=12861&cvs={ClientVideoSupport}&time={TimeOffset}&stdtime={StdTimeOffset}&abr={IsAdblockRequest}&pageurl={PageUrl}&tid={TrackingId}&res={Resolution}&bw={BrowserWidth}&bh={BrowserHeight}&kw={Keywords}&referrerUrl={ReferrerUrl}&pw={PlayerWidth}&ph={PlayerHeight}', adv_pre_skip_duration: '5', adv_pre_skip_text_time: 'Skip ad in %time', adv_pre_skip_text: 'Skip ad', adv_post_vast: 'https://twinrdsrv.com/preroll.engine?id=613eb379-62dd-49ef-8299-db2b5b2af4d7&zid=12861&cvs={ClientVideoSupport}&time={TimeOffset}&stdtime={StdTimeOffset}&abr={IsAdblockRequest}&pageurl={PageUrl}&tid={TrackingId}&res={Resolution}&bw={BrowserWidth}&bh={BrowserHeight}&kw={Keywords}&referrerUrl={ReferrerUrl}&pw={PlayerWidth}&ph={PlayerHeight}',
adv_post_skip_duration: '5',
adv_post_skip_text_time: 'Skip ad in %time', adv_post_skip_text: 'Skip ad',
lrcv: '1651572296480833989009946', vast_timeout1: '10',
player_width: '882',
player_height: '496.9014084507',
embed: '1'
}'''
import ast
import re
js_obj = js_obj.replace("'https:",'https')
js_obj = re.sub(r'([\d\w]*):', "'\\1':", js_obj)
js_obj = js_obj.replace("https","'https:")
py_obj = ast.literal_eval(js_obj)
Output:
print(py_obj['video_id'])
print(py_obj['video_url'])
print(py_obj['video_alt_url'])
000101
https://www.example.com/get_file/5/bb6a5e180f5037a3f348fbdee96a8c6f681c4c0bab/107000/107389/107389.mp4/?br=709
https://www.example.com/get_file/5/47601c7136bcbe38e6eb0b2cfa04dd9d917aa6263b/107000/107389/107389_720p.mp4/?br=1243
Answered By - chitown88
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.