Issue
This is an example of the type of block of HTML source code I'm targeting with BeautifulSoup
<div class="fighter_list left">
<meta itemprop="image" content="/image_crop/44/44/_images/fighter/1406924569376_20140801011731_Picture17.JPG">
<img class="lazy" src="/image_crop/44/44/_images/fighter/1406924569376_20140801011731_Picture17.JPG" data-original="/image_crop/44/44/_images/fighter/1406924569376_20140801011731_Picture17.JPG" alt="Jason DeLucia" title="Jason DeLucia" />
<div class="fighter_result_data">
<a itemprop="url" href="/fighter/Jason-DeLucia-22"><span itemprop="name">Jason<br />DeLucia</span></a><br>
This is one of multiple blocks like this for each "fighter_list left" on the page.
I want to get all of the itemprop="url" href links that are in the "fighter_list left" class (i.e. /fighter/Jason-DeLucia-22)
When I try the below code I get nothing.
for link in html.find_all('a', class_="fighter_List left", itemprop="url"):
print(link.get('href'))
The closest I can get is getting every itemprop=url link on the page when I omit the class_= part. But I only want the ones under the fighter_list left class.
This is the website https://www.sherdog.com/events/UFC-1-The-Beginning-7
Solution
You can use CSS selector for the task:
import requests
from bs4 import BeautifulSoup
url = "https://www.sherdog.com/events/UFC-1-The-Beginning-7"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for link in soup.select('.fighter_list.left [itemprop="url"]'):
print(link["href"])
Prints:
/fighter/Jason-DeLucia-22
/fighter/Royce-Gracie-19
/fighter/Gerard-Gordeau-15
/fighter/Ken-Shamrock-4
/fighter/Royce-Gracie-19
/fighter/Kevin-Rosier-17
/fighter/Gerard-Gordeau-15
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.