Thursday, October 14, 2021

[FIXED] How to extract the p tag text that contains an anchor text and matches a criteria from a page

October 14, 2021 beautifulsoup, python, python-3.x, selenium No comments

Issue

My code does not produce very readable result from my grabbed data. I have some approach that is within my understanding however, I cant get it to work properly.

import requests
from bs4 import BeautifulSoup
import time
from selenium import webdriver

driver = webdriver.Chrome('chromedriver.exe')
addrlist = ['https://poocoin.app/rugcheck/0x8076c74c5e3f5852037f31ff0093eeb8c8add8d3/dev-activity',
            'https://poocoin.app/rugcheck/0xd7ac542add4994a9d72369ab8d4788a38df6a217/dev-activity',
            'https://poocoin.app/rugcheck/0xf017e2773e4ee0590c81d79ccbcf1b2de1d22877/dev-activity']

for url in addrlist: 
    driver.get(url)

    time.sleep(8)
    soup = BeautifulSoup(driver.page_source, 'lxml')
    pdata = soup.find_all('div',attrs={"class":"mt-2"})
    for x in pdata:
        print (x.find('p'))
    print ()
driver.quit()

Current Output: #-- Some parts only.

<p><a href="/tokens/0x8076c74c5e3f5852037f31ff0093eeb8c8add8d3">Go to chart</a></p>
<p>This is a log of activity related to the token from all wallets that have had ownership of the contract.</p>
<p>Wallet activity for <a href="https://bscscan.com/address/0xCd198Be08A33cbe2172f3BE45cdB431E060076BC" rel="noreferrer" target="_blank">0xCd198Be08A33cbe2172f3BE45cdB431E060076BC</a></p>
<p>Wallet activity for <a href="https://bscscan.com/address/0x79c4af7c43f500b9ccba9396d079cc03dfcafda1" rel="noreferrer" target="_blank">0x79c4af7c43f500b9ccba9396d079cc03dfcafda1</a><br/><span class="text-muted text-small">(Ownership transferred to <a href="https://bscscan.com/address/undefined" rel="noreferrer" target="_blank"></a> on 9/3/2021, 1:55:09 AM)</span></p>
<p>Wallet activity for <a href="https://bscscan.com/address/0xc95063d946242f26074a76c8a2e94c9d735dfc78" rel="noreferrer" target="_blank">0xc95063d946242f26074a76c8a2e94c9d735dfc78</a><br/><span class="text-muted text-small">(Ownership transferred to <a href="https://bscscan.com/address/0x79c4af7c43f500b9ccba9396d079cc03dfcafda1" rel="noreferrer" target="_blank">0x79c4af7c43f500b9ccba9396d079cc03dfcafda1</a> on 4/1/2021, 8:46:31 AM)</span></p>

Wanted Output: #-- Only grab if the anchor text is not empty

0x8076c74c5e3f5852037f31ff0093eeb8c8add8d3
  Wallet activity for 0xc95063d946242f26074a76c8a2e94c9d735dfc78
  (Ownership transferred to 0x79c4af7c43f500b9ccba9396d079cc03dfcafda1 on 01/04/2021, 8:46:31 am)

0xd7ac542add4994a9d72369ab8d4788a38df6a217
  Wallet activity for 0x9ecedaafc0d45ad80b2515e24c61d6a7c5b917bd
  (Ownership transferred to 0x0000000000000000000000000000000000000000 on 06/09/2021, 7:37:22 pm)

0xf017e2773e4ee0590c81d79ccbcf1b2de1d22877
  Wallet activity for 0x61b1e31107953f8af76d19ba503ed1798b760c13
  (Ownership transferred to 0x0000000000000000000000000000000000000000 on 23/04/2021, 5:27:11 am)

Solution

You can do like this.

The data you need is present inside the last <div class="mt-2">. Just select the last <div>, find the <p> and print it's text.

Here is the code that print the data you need.

import requests
from bs4 import BeautifulSoup
import time
from selenium import webdriver

driver = webdriver.Chrome('chromedriver.exe')
addrlist = ['https://poocoin.app/rugcheck/0x8076c74c5e3f5852037f31ff0093eeb8c8add8d3/dev-activity',
            'https://poocoin.app/rugcheck/0xd7ac542add4994a9d72369ab8d4788a38df6a217/dev-activity',
            'https://poocoin.app/rugcheck/0xf017e2773e4ee0590c81d79ccbcf1b2de1d22877/dev-activity']

for url in addrlist: 
    driver.get(url)

    time.sleep(5)
    soup = BeautifulSoup(driver.page_source, 'lxml')
    pdata = soup.find_all('div',attrs={"class":"mt-2"})[-1]
    print(pdata.find('p').text.strip())
    
driver.quit()

Wallet activity for 0xc95063d946242f26074a76c8a2e94c9d735dfc78(Ownership transferred to 0x79c4af7c43f500b9ccba9396d079cc03dfcafda1 on 4/1/2021, 6:16:31 AM)

Wallet activity for 0x9ecedaafc0d45ad80b2515e24c61d6a7c5b917bd(Ownership transferred to 0x0000000000000000000000000000000000000000 on 9/6/2021, 5:07:22 PM)

Wallet activity for 0x61b1e31107953f8af76d19ba503ed1798b760c13(Ownership transferred to 0x0000000000000000000000000000000000000000 on 4/23/2021, 2:57:11 AM)

Answered By - Ram

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, October 14, 2021

[FIXED] How to extract the p tag text that contains an anchor text and matches a criteria from a page

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels