Issue
I am trying to grab some data. The problem I am facing is that the page refreshes every few seconds. I wanted to limit the data grabs based on the latest block only and refresh the scan and hopefully catch up with the next succeeding block. Any idea will be very helpful.
Goal #1 - Continuity with grabbed blocks
Goal #2 - Eliminate Duplicates
from bs4 import BeautifulSoup
from time import sleep
import re, requests
trim = re.compile(r'[^\d,.]+')
url = "https://bscscan.com/txs?a=0x10ed43c718714eb63d5aa57b78b54704e256024e&ps=100&p=1"
baseurl = 'https://bscscan.com/tx/'
header = {"User-Agent": "Mozilla/5.0"}
scans = 0
while True:
scans += 1
reqtxsInternal = requests.get(url,header, timeout=2)
souptxsInternal = BeautifulSoup(reqtxsInternal.content, 'html.parser')
blocktxsInternal = souptxsInternal.findAll('table')[0].findAll('tr')
for row in blocktxsInternal[1:]:
txnhash = row.find_all('td')[1].text[0:]
txnhashdetails = txnhash.strip()
block = row.find_all('td')[3].text[0:]
value = row.find_all('td')[9].text[0:]
amount = trim.sub('', value).replace(",", "")
transval = float(amount)
if float(transval) >= 1:
print ("Doing something with the data -> " + str(block) + " " + str(transval))
else:
pass
print (" -> Whole Page Scanned: ", scans)
sleep(1)
Current Output: #-- will be different when you run the script
Doing something with the data -> 10186993 1.233071907624764
Doing something with the data -> 10186993 4.689434542638692
Doing something with the data -> 10186993 27.97137792744322 #-- grab only until here and reload the scan
Doing something with the data -> 10186992 9.0
Doing something with the data -> 10186991 2.98
Doing something with the data -> 10186991 1.0
-> Whole Page Scanned: 1
Doing something with the data -> 10186994 1.026868093169767
Doing something with the data -> 10186994 4.0
Doing something with the data -> 10186994 4.55582682
Doing something with the data -> 10186994 8.184713205161088
Doing something with the data -> 10186993 1.233071907624764
Doing something with the data -> 10186993 4.689434542638692
Doing something with the data -> 10186993 27.97137792744322
Doing something with the data -> 10186992 9.0
-> Whole Page Scanned: 2
Wanted Output:
Doing something with the data -> 10186993 1.233071907624764
Doing something with the data -> 10186993 4.689434542638692
Doing something with the data -> 10186993 27.97137792744322
-> Whole Page Scanned: 1
Doing something with the data -> 10186994 1.026868093169767
Doing something with the data -> 10186994 4.0
Doing something with the data -> 10186994 4.55582682
Doing something with the data -> 10186994 8.184713205161088
-> Whole Page Scanned: 2
Solution
I utilized Pandas
here since it uses beautifulsoup under the hood anyway, but since it's a table, I let pandas parse it. Then it's easy to manipulate the table.
So what it looks like is you only want the latest/max "Block"
then return any values greater than or equal to 1. Does this give you what you want?
import pandas as pd
from time import sleep
import requests
url = "https://bscscan.com/txs?a=0x10ed43c718714eb63d5aa57b78b54704e256024e&ps=100&p=1"
baseurl = 'https://bscscan.com/tx/'
header = {"User-Agent": "Mozilla/5.0"}
scans = 0
while True:
scans += 1
reqtxsInternal = requests.get(url,header, timeout=2)
df = pd.read_html(reqtxsInternal.text)[0]
df = df[df['Block'] == max(df['Block'])]
df['Value'] = df['Value'].str.extract('(^\d*.*\d+)')
df = df[df['Value'].astype(float) >= 1]
print (df[['Block','Value']])
print (" -> Whole Page Scanned: ", scans)
sleep(1)
Your other option is just have it check to see if the current 'block'
is greater than the previous. Then add that logic to only print if it is:
from bs4 import BeautifulSoup
from time import sleep
import re, requests
trim = re.compile(r'[^\d,.]+')
url = "https://bscscan.com/txs?a=0x10ed43c718714eb63d5aa57b78b54704e256024e&ps=100&p=1"
baseurl = 'https://bscscan.com/tx/'
header = {"User-Agent": "Mozilla/5.0"}
scans = 0
previous_block = 0
while True:
scans += 1
reqtxsInternal = requests.get(url,header, timeout=2)
souptxsInternal = BeautifulSoup(reqtxsInternal.content, 'html.parser')
blocktxsInternal = souptxsInternal.findAll('table')[0].findAll('tr')
for row in blocktxsInternal[1:]:
txnhash = row.find_all('td')[1].text[0:]
txnhashdetails = txnhash.strip()
block = row.find_all('td')[3].text[0:]
if float(block) > float(previous_block):
previous_block = block
value = row.find_all('td')[9].text[0:]
amount = trim.sub('', value).replace(",", "")
transval = float(amount)
if float(transval) >= 1 and block == previous_block:
print ("Doing something with the data -> " + str(block) + " " + str(transval))
else:
pass
print (" -> Whole Page Scanned: ", scans)
sleep(1)
Answered By - chitown88
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.