Issue
I am trying to find random videos with creating random id's. According to my logic i create random video id for link and check its html. if a video is exist, there are multiple video id's like 15 or 20. if it is not a valid url there only 5 or 6 video id's in html.
import urllib.request
import re
import random
import string
def random_url():
#There is random 11 digit after v=
url_base = "https://www.youtube.com/watch?v="
for _ in range(11):
url_base += (random.choice(string.ascii_letters + string.digits))
#valid videos has at least 10 video id on the html.
#Thats how code i check it is valid video or not
html = urllib.request.urlopen(url_base)
video_ids = re.findall(r"watch\?v=(\S{11})", html.read().decode())
if len(video_ids) < 10:
print(url_base)
return random_url()
else:
print(video_ids)
print(url_base)
print("done")
random_url()
when i tested, the detection works fine but i couldn't find any video before recursion limit come out. what should i do?
Solution
Recursion is not the right approach here. You want to simply retry until you find something valid which might not happen for a LONG time.
Note that there is propaganda suggesting there are "a billion videos on YouTube". However, randomly guessing a URL even without the valid punctuation that some videos have gives you 52 billion combinations that are invalid for every single valid URL. Even just adding in the additional option of the -
and you jump to 62 billion misses for every hit.
For additional context, 62 billion seconds is closing in on 2000 years. Good luck :-)
import urllib.request
import re
import random
import string
def random_url():
all_letters_and_numbers = string.ascii_letters + string.digits + "-"
url_base = "https://www.youtube.com/watch?v="
attempts = 0
while True:
attempts += 1
url_actual = url_base + "".join(random.choices(all_letters_and_numbers, k=11))
#url_actual = url_base + "dQw4w9WgXcQ" ## uncomment for valid "Rick Roll"
print(f"Attempt: {attempts} Trying: {url_actual}")
html = urllib.request.urlopen(url_actual)
video_ids = re.findall(r"watch\?v=(\S{11})", html.read().decode())
if len(video_ids) >= 10:
return url_actual
print(f"Found: {random_url()}")
Answered By - JonSG
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.