Sunday, January 9, 2022

[FIXED] Python and Beautifulsoup extract multiple li items and its anchor text with the link

January 09, 2022 beautifulsoup, parsing, python, python-3.x, webrequest No comments

Issue

I am trying to work on extracting few data from a page (https://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7). I am successful in grabbing some of the data that I wanted but still, I am having trouble extracting some. Any ideas will be very helpful.

from bs4 import BeautifulSoup
from urllib import request
from urllib.request import Request, urlopen

req = Request('https://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, 'html.parser')

val = soup.find('span', class_='u-label u-label--value u-label--secondary text-dark rounded   mr-1').text
transfee = soup.find('span', id='ContentPlaceHolder1_spanTxFee').text
fromaddr = soup.find('span', id='spanFromAdd').text
token = soup.find('span', class_='hash-tag text-truncate hash-tag-custom-from tooltip-address').text

print ("From: \t\t ", fromaddr)
print ("Value: \t\t ", val)
print ("Transaction Fee: ", transfee)
print ("Tokens: \t ", token)

Current Output:

From:             0x6bdfe0696aa4f81245325c7931c117f15459e07a
Value:            0.679753633258727619 BNB
Transaction Fee:  0.00059691 BNB ($0.18)
Tokens:           PancakeSwap: Router v2

Wanted Output:

From:             0x6bdfe0696aa4f81245325c7931c117f15459e07a
Value:            0.679753633258727619 BNB
Transaction Fee:  0.00059691 BNB ($0.18)
#-- the part I cant get to work
Tokens:       Wrapped BNB (WBNB) -> https://bscscan.com/token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c 
              FaraCrystal (FARA) -> https://bscscan.com/token/0xf4ed363144981d3a65f42e7d0dc54ff9eef559a1

Solution

I have used css selector method which first find div tag and from it ul tag and it returns list of tags where we have to select index 1 which content the data

from bs4 import BeautifulSoup
from urllib import request
from urllib.request import Request, urlopen

req = Request('https://bscscan.com/tx/0x1b6f00c8cd99e0daac5718c743ef9a51af40f95feae23bf29960ae1f66a1cff7', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, 'html.parser')

main_data=soup.select("div.row > div.col-md-9 >ul.list-unstyled.mb-0")[1]
for i in main_data:
    print(i.find_all("a")[-1].get_text())
    print("https://bscscan.com/token/"+i.find_all("a")[-1]['href'])

Output:

Wrapped BNB (WBNB)
https://bscscan.com/token//token/0xbb4cdb9cbd36b01bd1cbaebf2de08d9173bc095c
FaraCrystal (FARA) 
https://bscscan.com/token//token/0xf4ed363144981d3a65f42e7d0dc54ff9eef559a1

Or by using find_all method

main_data=soup.find_all("ul", class_="list-unstyled mb-0")

Answered By - Bhavya Parikh

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, January 9, 2022

[FIXED] Python and Beautifulsoup extract multiple li items and its anchor text with the link

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels