Sunday, April 10, 2022

[FIXED] Beautiful Soup Nested Tag Search returns None after 10th search

April 10, 2022 beautifulsoup, html, python No comments

Issue

I am trying to write a python script using Beautiful Soup that will scrape the name and the symbol of each cryptocurrency. Despite there being over hundreds of symbols, after the 10th iteration, None gets returned. Could anyone help me out? The website I am trying to scrap is https://coinmarketcap.com

The Code:

from bs4 import BeautifulSoup
import requests
import csv

source=requests.get('https://coinmarketcap.com').text

soup = BeautifulSoup(source, 'html.parser')

def scrape_data():
    container = soup.find('tbody')
    theData = container.find_all("tr")
    for i in theData:
        individual_symbol= i.find('p', attrs= {"class":"sc-1eb5slv-0 gGIpIK coin-item-symbol"})
        individual_name = i.find('p', attrs= {"class":"sc-1eb5slv-0 iworPT"})
        print('Name: {}, Symbol: {}'.format(individual_name.text, individual_symbol.text))

scrape_data()

This gets returned

Name: Bitcoin, Symbol: BTC
Name: Ethereum, Symbol: ETH
Name: Tether, Symbol: USDT
Name: BNB, Symbol: BNB
Name: USD Coin, Symbol: USDC
Name: XRP, Symbol: XRP
Name: Terra, Symbol: LUNA
Name: Cardano, Symbol: ADA
Name: Solana, Symbol: SOL
Name: Avalanche, Symbol: AVAX
Traceback (most recent call last):
  File "/Users/ryan/Documents/PythonProjects/EODWebScrape/main.py", line 18, in <module>
    scrape_data()
  File "/Users/ryan/Documents/PythonProjects/EODWebScrape/main.py", line 15, in scrape_data
    print(individual_symbol.text)
AttributeError: 'NoneType' object has no attribute 'text'
ryan@Ryans-MBP PythonProjects %

Solution

The data is present within the <script> tags in json format. I'm always of the mindset of get the full data, then can always filter out what you need. This will get the full data available:

Code:

import pandas as pd
import requests
from bs4 import BeautifulSoup
import json
import re

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}

dfs = []
for page in range(1,21):
    print(f'Page: {page}')
    url = f'https://coinmarketcap.com/?page={page}'
    response = requests.get(url, headers=headers)
    
    soup = BeautifulSoup(response.text, 'html.parser')
    script = soup.find_all('script')[-1]
    
    jsonStr = re.search('({.*})', str(script)).group(1)
    jsonData = json.loads(jsonStr)
    
    colsData = jsonData['props']['initialState']['cryptocurrency']['listingLatest']['data'][0]
    cols = colsData['keysArr'] + colsData['excludeProps']
    data = jsonData['props']['initialState']['cryptocurrency']['listingLatest']['data'][1:]
    
    df = pd.DataFrame(data, columns=cols)
    dfs.append(df)
    
df = pd.concat(dfs, axis=0)


name_symbol = df[['name','symbol']]

Full data :

print(df)
             ath        atl  ...  quotes.1.tvl  quotes.2.tvl
0   68789.625939  65.526001  ...           NaN           NaN
1    4891.704698   0.420897  ...           NaN           NaN
2       1.215490   0.568314  ...           NaN           NaN
3     690.931965   0.096109  ...           NaN           NaN
4       2.349556   0.929222  ...           NaN           NaN
..           ...        ...  ...           ...           ...
95      0.054136   0.000109  ...           NaN           NaN
96   1516.640112   0.000000  ...           NaN           NaN
97      0.066469   0.000600  ...           NaN           NaN
98      0.750742   0.000201  ...           NaN           NaN
99      0.015614   0.000111  ...           NaN           NaN

[2000 rows x 153 columns]

Name/Symbol:

print(name_symbol)
                 name symbol
0             Bitcoin    BTC
1            Ethereum    ETH
2              Tether   USDT
3                 BNB    BNB
4            USD Coin   USDC
..                ...    ...
95              HYCON    HYC
96  Pepemon Pepeballs  PPBLZ
97           IONChain   IONC
98          DecentBet   DBET
99          BlitzPick    XBP

[2000 rows x 2 columns]

Answered By - chitown88

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, April 10, 2022

[FIXED] Beautiful Soup Nested Tag Search returns None after 10th search

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels