Issue
url = "www.xxxx.com"
rlink = requests.get(url, cookies=cookies).content
html = BeautifulSoup(rlink, 'html.parser')
scripttags = html.findAll("script")
In html DOM, it will have about 7x script tags, I need to search a variable (unique) in every script tag
variable is
var playbackUrl = 'https://www.yyyy.com'
for i in range(len(scripttags)):
if "playbackUrl" in str(scripttags[i]):
for j in str(scripttags[i]).split("\n"):
if "playbackUrl" in j:
url_=re.search("'(.*)'", j).group(1)
print(url_)
though my script can do the job, however, just wonder if any smart way to do the task
Solution
Code can be more readable if you learn to use for
-loop without range(len())
And you don't have to split it into lines
html = '''<script>
var other = 'test';
var playbackUrl = 'https://www.example1.com';
var next = 'test';
</script>
<script>
var other = 'test';
var playbackUrl = 'https://www.example2.com';
var next = 'test';
</script>
'''
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(html, 'html.parser')
scripttags = soup.find_all("script")
for script in scripttags:
results = re.search("var playbackUrl = '(.*)'", script.text)
if results:
print('search:', results[1])
# OR
results = re.findall("var playbackUrl = '(.*)'", script.text)
if results:
print('findall:', results[0])
Result:
search: https://www.example1.com
findall: https://www.example1.com
search: https://www.example2.com
findall: https://www.example2.com
Answered By - furas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.