Saturday, August 20, 2022

[FIXED] scrapping web link from 247sports

August 20, 2022 beautifulsoup, python-3.x No comments

Issue

I am trying to grab a rankings history weblink from one url by using the following scrapping code

import requests
from bs4 import BeautifulSoup

url = 'https://247sports.com/Player/Trevor-Lawrence-61350/college-212444/'

pageTree = requests.get(url, headers=headers)
Soup = BeautifulSoup(pageTree.content, 'html.parser')

past_link = Soup.find_all('ul', {'class':'ranks-list'})

past_link

I was able to generate this output

[<ul class="ranks-list">
 <li>
 <b>Natl.</b>
 <a href="https://247sports.com/Season/2018-Football/CompositeRecruitRankings/?InstitutionGroup=HighSchool">
 <strong>1</strong>
 </a>
 <a class="rank-history-link" href="https://247sports.com/PlayerSport/Trevor-Lawrence-at-Cartersville-116605/RecruitRankHistory/">
                     History
                 </a>
 </li>
 <li>
 <b>PRO</b>
 <a href="https://247sports.com/Season/2018-Football/CompositeRecruitRankings/?InstitutionGroup=HighSchool&amp;Position=PRO">
 <strong>1</strong>
 </a>
 </li>
 <li>
 <b>GA</b>
 <a href="https://247sports.com/Season/2018-Football/CompositeRecruitRankings/?InstitutionGroup=HighSchool&amp;State=GA">
 <strong>1</strong>
 </a>
 </li>
 <li>
 <b>All-Time</b>
 <a href="https://247sports.com/Sport/Football/AllTimeRecruitRankings/">
 <strong>6</strong>
 </a>
 </li>
 </ul>]

But going any further with something like as a "past_link.find_all('a')" led to running into errors. What do you think should be the next step from here? Any assistance is truly appreciated. Thanks in advance.

Solution

To get rankings history link from that page you can use next example:

import requests
from bs4 import BeautifulSoup

url = "https://247sports.com/Player/Trevor-Lawrence-61350/college-212444/"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:103.0) Gecko/20100101 Firefox/103.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

history_link = soup.select_one(".rank-history-link")["href"]
print(history_link)

Prints:

https://247sports.com/PlayerSport/Trevor-Lawrence-at-Cartersville-116605/RecruitRankHistory/

Answered By - Andrej Kesely

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, August 20, 2022

[FIXED] scrapping web link from 247sports

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels