Issue
I am trying to scrape historical snapshot data from coinmarketcap using python:
https://coinmarketcap.com/historical/20201227/
I've tried to use the beautifulsoup. It works fine until row 20 but after that the returned rows look a lot different.
import pandas as pd
import requests
from bs4 import BeautifulSoup
date = '20211219/'
URL = 'https://coinmarketcap.com/historical/' + date
webpage = requests.get(URL)
soup = BeautifulSoup(webpage.text, 'lxml') # 'html.parser'
tr = soup.find_all('tr', attrs={'class': 'cmc-table-row'})
The first twenty elements of tr contains all the columns from the webpage.
Starting with the 21st element it looks much different and doesn't include what's actually on the table on the webpage:
So i am not successful in scraping the data after 20th row. How can I access this part of the table?
Solution
In case you haven't found a solution by now: that page is pulling the info from an api, and the following code will get you the data you're after:
import pandas as pd
import requests
my_date = '2020-12-27'
r = requests.get(f'https://web-api.coinmarketcap.com/v1/cryptocurrency/listings/historical?convert=USD,USD,BTC&date={my_date}&limit=5000&start=1')
df = pd.DataFrame(r.json()['data'])
print(df)
This return a rather large dataframe [4048 rows x 33 columns]:
id | name | symbol | slug | num_market_pairs | date_added | tags | max_supply | circulating_supply | total_supply | platform | cmc_rank | self_reported_circulating_supply | self_reported_market_cap | tvl_ratio | last_updated | quote.BTC.price | quote.BTC.volume_24h | quote.BTC.percent_change_1h | quote.BTC.percent_change_24h | quote.BTC.percent_change_7d | quote.BTC.market_cap | quote.BTC.fully_diluted_market_cap | quote.BTC.tvl | quote.BTC.last_updated | quote.USD.price | quote.USD.volume_24h | quote.USD.percent_change_1h | quote.USD.percent_change_24h | quote.USD.percent_change_7d | quote.USD.market_cap | quote.USD.tvl | quote.USD.last_updated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bitcoin | BTC | bitcoin | 9712 | 2013-04-28T00:00:00.000Z | ['mineable', 'pow', 'sha-256', 'store-of-value', 'state-channel', 'coinbase-ventures-portfolio', 'three-arrows-capital-portfolio', 'polychain-capital-portfolio', 'binance-labs-portfolio', 'blockchain-capital-portfolio', 'boostvc-portfolio', 'cms-holdings-portfolio', 'dcg-portfolio', 'dragonfly-capital-portfolio', 'electric-capital-portfolio', 'fabric-ventures-portfolio', 'framework-ventures-portfolio', 'galaxy-digital-portfolio', 'huobi-capital-portfolio', 'alameda-research-portfolio', 'a16z-portfolio', '1confirmation-portfolio', 'winklevoss-capital-portfolio', 'usv-portfolio', 'placeholder-ventures-portfolio', 'pantera-capital-portfolio', 'multicoin-capital-portfolio', 'paradigm-portfolio'] | 2.1e+07 | 1.85828e+07 | 1.85828e+07 | 1 | 2020-12-27T23:00:00.000Z | 1 | 2.53042e+06 | 0 | 0 | 0 | 1.85828e+07 | 2020-12-27T23:59:41.000Z | 26272.3 | 6.64799e+10 | -0.910864 | -0.623152 | 11.9051 | 4.88213e+11 | 2020-12-27T23:00:00.000Z | |||||||
1 | 1027 | Ethereum | ETH | ethereum | 5916 | 2015-08-07T00:00:00.000Z | ['mineable', 'pow', 'smart-contracts', 'ethereum-ecosystem', 'coinbase-ventures-portfolio', 'three-arrows-capital-portfolio', 'polychain-capital-portfolio', 'binance-labs-portfolio', 'blockchain-capital-portfolio', 'boostvc-portfolio', 'cms-holdings-portfolio', 'dcg-portfolio', 'dragonfly-capital-portfolio', 'electric-capital-portfolio', 'fabric-ventures-portfolio', 'framework-ventures-portfolio', 'hashkey-capital-portfolio', 'kenetic-capital-portfolio', 'huobi-capital-portfolio', 'alameda-research-portfolio', 'a16z-portfolio', '1confirmation-portfolio', 'winklevoss-capital-portfolio', 'usv-portfolio', 'placeholder-ventures-portfolio', 'pantera-capital-portfolio', 'multicoin-capital-portfolio', 'paradigm-portfolio', 'injective-ecosystem', 'bnb-chain'] | nan | 1.1401e+08 | 1.1401e+08 | 2 | 2020-12-27T23:00:00.000Z | 0.0259834 | 993197 | -0.514148 | 7.36142 | 6.94848 | 2.96236e+06 | 2020-12-27T23:59:41.000Z | 682.642 | 2.60936e+10 | -0.514148 | 7.36142 | 6.94848 | 7.78281e+10 | 2020-12-27T23:00:00.000Z | |||||||
2 | 825 | Tether | USDT | tether | 9666 | 2015-02-25T00:00:00.000Z | ['payments', 'stablecoin', 'asset-backed-stablecoin', 'avalanche-ecosystem', 'solana-ecosystem', 'arbitrum-ecosytem', 'moonriver-ecosystem', 'injective-ecosystem', 'bnb-chain', 'usd-stablecoin'] | nan | 2.07532e+10 | 2.12833e+10 | 3 | 2020-12-27T23:00:00.000Z | 3.80193e-05 | 3.62606e+06 | -0.00446154 | 0.0374141 | -0.0789107 | 789021 | 2020-12-27T23:59:41.000Z | 0.998854 | 9.52649e+10 | -0.00446154 | 0.0374141 | -0.0789107 | 2.07294e+10 | 2020-12-27T23:00:00.000Z | |||||||
3 | 52 | XRP | XRP | xrp | 683 | 2013-08-04T00:00:00.000Z | ['medium-of-exchange', 'enterprise-solutions', 'binance-chain', 'arrington-xrp-capital-portfolio', 'galaxy-digital-portfolio', 'a16z-portfolio', 'pantera-capital-portfolio'] | 1e+11 | 4.5404e+10 | 9.99908e+10 | 4 | 2020-12-27T23:00:00.000Z | 1.07733e-05 | 352094 | -1.1233 | -3.96119 | -49.0989 | 489151 | 2020-12-27T23:59:41.000Z | 0.283039 | 9.25033e+09 | -1.1233 | -3.96119 | -49.0989 | 1.28511e+10 | 2020-12-27T23:00:00.000Z | |||||||
4 | 2 | Litecoin | LTC | litecoin | 747 | 2013-04-28T00:00:00.000Z | ['mineable', 'pow', 'scrypt', 'medium-of-exchange', 'binance-chain', 'bnb-chain'] | 8.4e+07 | 6.61837e+07 | 6.61837e+07 | 5 | 2020-12-27T23:00:00.000Z | 0.00485367 | 536813 | -0.325724 | -1.50027 | 11.2073 | 321234 | 2020-12-27T23:59:41.000Z | 127.517 | 1.41033e+10 | -0.325724 | -1.50027 | 11.2073 | 8.43955e+09 | 2020-12-27T23:00:00.000Z |
[...]
Answered By - platipus_on_fire
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.