Issue
As an amateur I have been working on a little coding project for fun. I am looking to scrape quite a lot of data, and with the help of StackOverflow I got a pretty well working script. However, I am still missing one big step; I want to find the titles for certain images on the webpage. I can already gather all other data I need (defined by the red markings). All I need is the titles for the 3x2 image titles. See the screenshot below:
The image titles are not defined by a 'class', which makes it hard for me to find them. I tried using
for KTA in soup('img'):
KTAclass = KTA.get('title')
Which does work, but also provides a lot of 'None's in addition to the titles I'm looking for.
My current script looks like this:
import requests
from bs4 import BeautifulSoup
def analyze(i):
url = f"https://ktarena.com/fr/207-dofus-world-cup/match/{i}/1"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
names = [a.text for a in soup.select(".name a")]
points = [p.text for p in soup.select(".result .points")]
arena = soup.find("span", attrs=('name')).text
print(*zip(names, points,),arena)
for i in range(46270, 46273):
analyze(i)
Can anyone help me out here? Ideally I would like to add the 3 image titles per team to the zipped file currently containing team name and points.
Cheers!
Solution
This should do it. I've corrected the selectors to grab the accurate number of image titles:
import requests
from bs4 import BeautifulSoup
def analyze(i):
url = f"https://ktarena.com/fr/207-dofus-world-cup/match/{i}/1"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
arena = soup.find("span", attrs=('name')).text
title = soup.select_one("[class='team'] .name a").text
point = soup.select(".result .points")[0].text
image_titles = ', '.join([i['title'] for i in soup.select("[class='team']:nth-of-type(1) [class^='class'] > img")])
title_ano = soup.select("[class='team'] .name a")[1].text
point_ano = soup.select(".result .points")[1].text
image_titles_ano = ', '.join([i['title'] for i in soup.select("[class='team']:nth-of-type(2) [class^='class'] > img")])
print((title,point,image_titles),(title_ano,point_ano,image_titles_ano),arena)
for i in range(46270, 46274):
analyze(i)
Prints:
('Thunder', '0 pts', 'roublard, huppermage, ecaflip') ('Tweaps', '60 pts', 'steamer, feca, sacrieur') A10
('Shadow Zoo', '0 pts', 'feca, osamodas, ouginak') ('UndisClosed', '60 pts', 'eniripsa, sram, pandawa') A10
('Laugh Tale', '0 pts', 'osamodas, ecaflip, iop') ('FromTheAbyss', '60 pts', 'roublard, steamer, huppermage') A10
('Motamawa', '0 pts', 'osamodas, iop, pandawa') ('Espoo', '60 pts', 'roublard, ecaflip, sacrieur') A10
Answered By - robots.txt
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.