Issue
I am trying to write a python script using Beautiful Soup that will scrape the name and the symbol of each cryptocurrency. Despite there being over hundreds of symbols, after the 10th iteration, None gets returned. Could anyone help me out? The website I am trying to scrap is https://coinmarketcap.com
The Code:
from bs4 import BeautifulSoup
import requests
import csv
source=requests.get('https://coinmarketcap.com').text
soup = BeautifulSoup(source, 'html.parser')
def scrape_data():
container = soup.find('tbody')
theData = container.find_all("tr")
for i in theData:
individual_symbol= i.find('p', attrs= {"class":"sc-1eb5slv-0 gGIpIK coin-item-symbol"})
individual_name = i.find('p', attrs= {"class":"sc-1eb5slv-0 iworPT"})
print('Name: {}, Symbol: {}'.format(individual_name.text, individual_symbol.text))
scrape_data()
This gets returned
Name: Bitcoin, Symbol: BTC
Name: Ethereum, Symbol: ETH
Name: Tether, Symbol: USDT
Name: BNB, Symbol: BNB
Name: USD Coin, Symbol: USDC
Name: XRP, Symbol: XRP
Name: Terra, Symbol: LUNA
Name: Cardano, Symbol: ADA
Name: Solana, Symbol: SOL
Name: Avalanche, Symbol: AVAX
Traceback (most recent call last):
File "/Users/ryan/Documents/PythonProjects/EODWebScrape/main.py", line 18, in <module>
scrape_data()
File "/Users/ryan/Documents/PythonProjects/EODWebScrape/main.py", line 15, in scrape_data
print(individual_symbol.text)
AttributeError: 'NoneType' object has no attribute 'text'
ryan@Ryans-MBP PythonProjects %
Solution
The data is present within the <script>
tags in json format. I'm always of the mindset of get the full data, then can always filter out what you need. This will get the full data available:
Code:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import json
import re
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
dfs = []
for page in range(1,21):
print(f'Page: {page}')
url = f'https://coinmarketcap.com/?page={page}'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
script = soup.find_all('script')[-1]
jsonStr = re.search('({.*})', str(script)).group(1)
jsonData = json.loads(jsonStr)
colsData = jsonData['props']['initialState']['cryptocurrency']['listingLatest']['data'][0]
cols = colsData['keysArr'] + colsData['excludeProps']
data = jsonData['props']['initialState']['cryptocurrency']['listingLatest']['data'][1:]
df = pd.DataFrame(data, columns=cols)
dfs.append(df)
df = pd.concat(dfs, axis=0)
name_symbol = df[['name','symbol']]
Full data :
print(df)
ath atl ... quotes.1.tvl quotes.2.tvl
0 68789.625939 65.526001 ... NaN NaN
1 4891.704698 0.420897 ... NaN NaN
2 1.215490 0.568314 ... NaN NaN
3 690.931965 0.096109 ... NaN NaN
4 2.349556 0.929222 ... NaN NaN
.. ... ... ... ... ...
95 0.054136 0.000109 ... NaN NaN
96 1516.640112 0.000000 ... NaN NaN
97 0.066469 0.000600 ... NaN NaN
98 0.750742 0.000201 ... NaN NaN
99 0.015614 0.000111 ... NaN NaN
[2000 rows x 153 columns]
Name/Symbol:
print(name_symbol)
name symbol
0 Bitcoin BTC
1 Ethereum ETH
2 Tether USDT
3 BNB BNB
4 USD Coin USDC
.. ... ...
95 HYCON HYC
96 Pepemon Pepeballs PPBLZ
97 IONChain IONC
98 DecentBet DBET
99 BlitzPick XBP
[2000 rows x 2 columns]
Answered By - chitown88
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.