Sunday, October 31, 2021

[FIXED] Beautiful Soup Web Scraper on Binance Announcement Page Lags behind by 5 minutes

October 31, 2021 beautifulsoup, binance, python-3.x, web-scraping No comments

Issue

I have built a web scraper using bs4, where the purpose is to get notifications when a new announcement is posted. At the moment I am testing this with the word 'list' instead of all announcement keywords. For some reason when I compare the time it determines a new announcement has been posted versus the actual time it was posted in the website, the time is off by 5 minutes give or take.

from bs4 import BeautifulSoup
from requests import get
import time
import sys

x = True
while x == True:
    time.sleep(30)
    # Data for the scraping
    url = "https://www.binance.com/en/support/announcement"
    response = get(url)
    html_page = response.content
    soup = BeautifulSoup(html_page, 'html.parser')
    news_list = soup.find_all(class_ = 'css-qinc3w')

    # Create a bag of key words for getting matches
    key_words = ['list', 'token sale', 'open trading', 'opens trading', 'perpetual', 'defi', 'uniswap', 'airdrop', 'adds', 'updates', 'enabled', 'trade', 'support']

    # Empty list
    updated_list = []

    for news in news_list:
        article_text = news.text

        if ("list" in article_text.lower()):
            updated_list.append([article_text])

        if len(updated_list) > 4:
            print(time.asctime( time.localtime(time.time()) ))
            print(article_text)
            sys.exit()

The Response when the length of the list increased by 1 to 5 resulted in printing the following time, and new announcement: Fri May 28 04:17:39 2021, Binance Will List Livepeer (LPT)

I am unsure why this is. At first I thought I was being throttled, but looking again at robot.txt, I didn't see any reason why I should be. Moreover I included a sleep time of 30 seconds which should be more than enough to web scrape without any issues. Any help or an alternative solution would be much appreciated.

My question is:

Why is it 5 minutes behind? Why does it not notify me once the website posts it? The program takes 5 minutes longer to recognize there is a new post in comparison to the time it is posted on the website.

Solution

from xrzz import http ## give it try using my simple scratch module
import json

url = "https://www.binance.com/bapi/composite/v1/public/cms/article/list/query?type=1&pageNo=1&pageSize=30"

req = http("GET", url=url, tls=True).body().decode()

key_words = ['list', 'token sale', 'open trading', 'opens trading', 'perpetual', 'defi', 'uniswap', 'airdrop', 'adds', 'updates', 'enabled', 'trade', 'support']

for i in json.loads(req)['data']['catalogs']:
    for o in i['articles']:
        if key_words[0] in o['title']:
            print(o['title'])

Ouput:

Result

Answered By - dedshit

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, October 31, 2021

[FIXED] Beautiful Soup Web Scraper on Binance Announcement Page Lags behind by 5 minutes

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels