Issue
I'm working on a project to scrape the daily population of some servers in a game to see how they evolve. It's a table where each server is a 'tr' that contains several 'td' inside with information such as the number of players and also useless information. The thing is that I managed to pick up all the 'tr' I'm interested in, discarding the ones I don't want, but now I'm stuck trying to select only the 'td' inside each 'tr' that has the number of players, but I can't.
this is the html of that table:
This is the code I've written so far:
import requests
import pandas as pd
from bs4 import BeautifulSoup
import csv
from pprint import pprint
from datetime import date
url = ('https://www.tibia.com/community/?subtopic=worlds')
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
file = open('players_online', 'a')
writer = csv.writer(file)
list_of_players = list()
finding_td = soup.find_all('a', string=worlds)
for looking_for_players in finding_td:
parent_tr = looking_for_players.find_parent('tr')
names1 = [clean_data.findAll('td') for clean_data in parent_tr]
list_of_players.append(parent_tr)
If I print 'print(finding_td) I get the following:
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Astera">Astera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Belobra">Belobra</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Calmera">Calmera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Celebra">Celebra</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Gentebra">Gentebra</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Kalibra">Kalibra</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Luminera">Luminera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Menera">Menera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Nefera">Nefera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Pacera">Pacera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Yonabra">Yonabra</a>]
which is what I want, now I use the findparent and when I 'print(finding_tr) I get:
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Belobra">Belobra</a></td><td style="text-align: right;">731</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since June 22, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Calmera">Calmera</a></td><td style="text-align: right;">318</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 12, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Celebra">Celebra</a></td><td style="text-align: right;">559</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since October 29, 2018.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Odd"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Gentebra">Gentebra</a></td><td style="text-align: right;">757</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since December 12, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Kalibra">Kalibra</a></td><td style="text-align: right;">716</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since December 12, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Odd"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Luminera">Luminera</a></td><td style="text-align: right;">295</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 5, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Menera">Menera</a></td><td style="text-align: right;">364</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 5, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Nefera">Nefera</a></td><td style="text-align: right;">465</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since April 19, 2018.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Odd"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Pacera">Pacera</a></td><td style="text-align: right;">336</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 12, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Yonabra">Yonabra</a></td><td style="text-align: right;">446</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since May 27, 2020.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
so far so good, now that I have all the td, I want to make a line to select only the td which contains the number of players, I did it as follows:
names1 = [clean_data.findAll('td') for clean_data in parent_tr]
but when I append it or print it, it gives this:
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
And if I use names1 = [clean_data.find('td')[3] for clean_data in parent_tr] to find the specific 'td' which contains the data I want, the console says:
"IndexError: list index out of range".
That makes sense because its an empty list after all. Any idea of what's going wrong?
Solution
To get names and population of every regular world, you can try:
import requests
from bs4 import BeautifulSoup
url = "https://www.tibia.com/community/?subtopic=worlds"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for a in soup.select(".TableContent")[2].select("td > a"):
name = a.get_text(strip=True)
pop = a.find_next("td").get_text(strip=True)
print("{:<30} {}".format(name, pop))
Prints:
Adra 54
Antica 271
Assombra 166
Astera 419
Belluma 18
Belobra 743
Bona 150
Calmera 326
Carnera 73
Celebra 560
Celesta 82
Concorda 51
Cosera 103
Damora 89
Descubra 524
Dibra 379
Duna 6
Emera 70
Epoca 23
Estela 129
Faluna 24
Ferobra 599
Firmera 180
Funera 83
Furia 14
Garnera 299
Gentebra 769
Gladera 464
Harmonia 112
Helera 55
Honbra 629
Impera 340
Inabra 636
Javibra 229
Jonera 131
Kalibra 700
Karna 175
Kenora 90
Libertabra 364
Lobera 473
Luminera 293
Lutabra 469
Macabra 277
Menera 365
Mitigera 100
Monza 112
Mudabra 427
Nefera 475
Noctera 195
Nossobra 252
Olera 87
Ombra 601
Optera 186
Pacembra 402
Pacera 352
Peloria 226
Premia 74
Pyra 4
Quelibra 578
Quintera 280
Ragna 15
Refugia 103
Reinobra 555
Relania 54
Relembra 330
Secura 175
Serdebra 606
Serenebra 394
Solidera 395
Talera 538
Torpera 102
Tortura 25
Unica 13
Utobra 355
Venebra 485
Vita 31
Vunira 154
Wintera 415
Wizera 209
Xandebra 528
Xylona 16
Yonabra 419
Ysolera 87
Zenobra 281
Zuna 3
Zunera 39
Or: select all links and check if text of previous header is "Regular Worlds":
import requests
from bs4 import BeautifulSoup
url = "https://www.tibia.com/community/?subtopic=worlds"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for a in soup.select(".TableContent td > a"):
# check if we are in "Regular Worlds" table:
header = a.find_previous("td", {"style": "text-align: center;"})
if header.get_text(strip=True) != "Regular Worlds":
continue
name = a.get_text(strip=True)
pop = a.find_next("td").get_text(strip=True)
print("{:<30} {}".format(name, pop))
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.