Issue
The snippet partially works and also produces redundant output. I need help to make it fully works. I am searching for strings in a page and if a partial match or full match is found, the whole line will be returned.
from bs4 import BeautifulSoup as bs
import requests
addrlist = ['0xe56842ed550ff2794f010738554db45e60730371',
'0xe1fd7b4c9debac3c490d8a553c455da4979482e4',
'0x88c20beda907dbc60c56b71b102a133c1b29b053']
queries = ["Website", "Telegram", "https://www.", "Twitter", "https://t.me"]
baseurl = "https://bscscan.com/address/"
for i in addrlist:
url = str(baseurl) + str(i)
r = requests.get(url)
soup = bs(r.text,'lxml')
pre = soup.select_one('pre.js-sourcecopyarea.editor')
ss = (list(pre.stripped_strings)[0]).split('*')
for s in ss:
for query in queries:
if query in s:
print(s)
Current Output:
Website: https://binemon.io #output repeated 4x in actual run
Telegram: https://t.me/binemonchat
Twitter: https://twitter.com/binemonnft
// SPDX-License-Identifier: UNLICENSED #output repeated 4x in actual run
// IERC20.sol
Website: www.shibuttinu.com #output repeated 1x only
Telegram: https://t.me/Shibuttinu
Wanted Output:
Website: https://binemon.io
Telegram: https://t.me/binemonchat
Twitter: https://twitter.com/binemonnft
// Telegram : https://t.me/stackdogebsc
// Website : https://www.stack-doge.com
*Website: www.shibuttinu.com
*Telegram: https://t.me/Shibuttinu
Solution
You can use regular expression to extract the URLs:
import re
import requests
from bs4 import BeautifulSoup as bs
addrlist = [
"0xe56842ed550ff2794f010738554db45e60730371",
"0xe1fd7b4c9debac3c490d8a553c455da4979482e4",
"0x88c20beda907dbc60c56b71b102a133c1b29b053",
]
queries = ["Website", "Telegram", "https://www.", "Twitter", "https://t.me"]
baseurl = "https://bscscan.com/address/"
r_pat = re.compile("|".join("{}.*".format(re.escape(q)) for q in queries))
for i in addrlist:
url = str(baseurl) + str(i)
r = requests.get(url)
soup = bs(r.text, "lxml")
pre = soup.select_one("pre.js-sourcecopyarea.editor")
print(url)
print()
for m in r_pat.findall(pre.string):
print(m.strip())
print("-" * 80)
Prints:
https://bscscan.com/address/0xe56842ed550ff2794f010738554db45e60730371
Website: https://binemon.io
Telegram: https://t.me/binemonchat
Twitter: https://twitter.com/binemonnft
--------------------------------------------------------------------------------
https://bscscan.com/address/0xe1fd7b4c9debac3c490d8a553c455da4979482e4
Telegram : https://t.me/stackdogebsc
Website : https://www.stack-doge.com
--------------------------------------------------------------------------------
https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053
Website: www.shibuttinu.com
Telegram: https://t.me/Shibuttinu
--------------------------------------------------------------------------------
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.