Issue
My code does not produce very readable result from my grabbed data. I have some approach that is within my understanding however, I cant get it to work properly.
import requests
from bs4 import BeautifulSoup
import time
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
addrlist = ['https://poocoin.app/rugcheck/0x8076c74c5e3f5852037f31ff0093eeb8c8add8d3/dev-activity',
'https://poocoin.app/rugcheck/0xd7ac542add4994a9d72369ab8d4788a38df6a217/dev-activity',
'https://poocoin.app/rugcheck/0xf017e2773e4ee0590c81d79ccbcf1b2de1d22877/dev-activity']
for url in addrlist:
driver.get(url)
time.sleep(8)
soup = BeautifulSoup(driver.page_source, 'lxml')
pdata = soup.find_all('div',attrs={"class":"mt-2"})
for x in pdata:
print (x.find('p'))
print ()
driver.quit()
Current Output: #-- Some parts only.
<p><a href="/tokens/0x8076c74c5e3f5852037f31ff0093eeb8c8add8d3">Go to chart</a></p>
<p>This is a log of activity related to the token from all wallets that have had ownership of the contract.</p>
<p>Wallet activity for <a href="https://bscscan.com/address/0xCd198Be08A33cbe2172f3BE45cdB431E060076BC" rel="noreferrer" target="_blank">0xCd198Be08A33cbe2172f3BE45cdB431E060076BC</a></p>
<p>Wallet activity for <a href="https://bscscan.com/address/0x79c4af7c43f500b9ccba9396d079cc03dfcafda1" rel="noreferrer" target="_blank">0x79c4af7c43f500b9ccba9396d079cc03dfcafda1</a><br/><span class="text-muted text-small">(Ownership transferred to <a href="https://bscscan.com/address/undefined" rel="noreferrer" target="_blank"></a> on 9/3/2021, 1:55:09 AM)</span></p>
<p>Wallet activity for <a href="https://bscscan.com/address/0xc95063d946242f26074a76c8a2e94c9d735dfc78" rel="noreferrer" target="_blank">0xc95063d946242f26074a76c8a2e94c9d735dfc78</a><br/><span class="text-muted text-small">(Ownership transferred to <a href="https://bscscan.com/address/0x79c4af7c43f500b9ccba9396d079cc03dfcafda1" rel="noreferrer" target="_blank">0x79c4af7c43f500b9ccba9396d079cc03dfcafda1</a> on 4/1/2021, 8:46:31 AM)</span></p>
Wanted Output: #-- Only grab if the anchor text is not empty
0x8076c74c5e3f5852037f31ff0093eeb8c8add8d3
Wallet activity for 0xc95063d946242f26074a76c8a2e94c9d735dfc78
(Ownership transferred to 0x79c4af7c43f500b9ccba9396d079cc03dfcafda1 on 01/04/2021, 8:46:31 am)
0xd7ac542add4994a9d72369ab8d4788a38df6a217
Wallet activity for 0x9ecedaafc0d45ad80b2515e24c61d6a7c5b917bd
(Ownership transferred to 0x0000000000000000000000000000000000000000 on 06/09/2021, 7:37:22 pm)
0xf017e2773e4ee0590c81d79ccbcf1b2de1d22877
Wallet activity for 0x61b1e31107953f8af76d19ba503ed1798b760c13
(Ownership transferred to 0x0000000000000000000000000000000000000000 on 23/04/2021, 5:27:11 am)
Solution
You can do like this.
The data you need is present inside the last <div class="mt-2">
. Just select the last <div>
, find the <p>
and print it's text.
Here is the code that print the data you need.
import requests
from bs4 import BeautifulSoup
import time
from selenium import webdriver
driver = webdriver.Chrome('chromedriver.exe')
addrlist = ['https://poocoin.app/rugcheck/0x8076c74c5e3f5852037f31ff0093eeb8c8add8d3/dev-activity',
'https://poocoin.app/rugcheck/0xd7ac542add4994a9d72369ab8d4788a38df6a217/dev-activity',
'https://poocoin.app/rugcheck/0xf017e2773e4ee0590c81d79ccbcf1b2de1d22877/dev-activity']
for url in addrlist:
driver.get(url)
time.sleep(5)
soup = BeautifulSoup(driver.page_source, 'lxml')
pdata = soup.find_all('div',attrs={"class":"mt-2"})[-1]
print(pdata.find('p').text.strip())
driver.quit()
Wallet activity for 0xc95063d946242f26074a76c8a2e94c9d735dfc78(Ownership transferred to 0x79c4af7c43f500b9ccba9396d079cc03dfcafda1 on 4/1/2021, 6:16:31 AM)
Wallet activity for 0x9ecedaafc0d45ad80b2515e24c61d6a7c5b917bd(Ownership transferred to 0x0000000000000000000000000000000000000000 on 9/6/2021, 5:07:22 PM)
Wallet activity for 0x61b1e31107953f8af76d19ba503ed1798b760c13(Ownership transferred to 0x0000000000000000000000000000000000000000 on 4/23/2021, 2:57:11 AM)
Answered By - Ram
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.