Issue
I'm trying to pull a number that is in a td, but this td has repeated classes, and the table doesn't contain class or tr, how can I do to get this number(1,00)?
this is the html:
my code:
import requests
from bs4 import BeautifulSoup as BS
sample_website = ('https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-tributaria/pagamentos-e-parcelamentos/taxa-de-juros-selic#Taxa_de_Juros_Selic')
page=requests.get(sample_website)
soup = BS(page.content, "html.parser")
for row in soup.select('table')[1:]:
taxa = soup.select('tr')[5:]
valor_especifico = row.find_all('td')[5:]
print(valor_especifico)
This is output:
C:\Users\Francisco\PycharmProjects\INSS\Scripts\python.exe C:/Users/Francisco/PycharmProjects/INSS/MODULOS/web.py
[<td class="xl74" style="text-align: center; "><strong>1999</strong></td>, <td class="xl75" height="19"> <strong>janeiro</strong></td>, <td class="xl80" style="text-align: center; ">391,17</td>, <td class="xl80" style="text-align: center; ">349,88</td>, <td class="xl80" style="text-align: center; ">326,26</td>, <td class="xl80" style="text-align: center; ">302,97</td>, <td class="xl80" style="text-align: center; ">277,88</td>, <td class="xl83" height="19"> <strong>fevereiro</strong></td>, <td class="xl80" style="text-align: center; ">387,54</td>, <td class="xl80" style="text-align: center; ">347,53</td>, <td class="xl80" style="text-align: center; ">324,59</td>, <td class="xl80" style="text-align: center; ">300,84</td>, <td class="xl80" style="text-align: center; ">275,50</td>, <td class="xl83" height="19"> <strong>março</strong></td>, <td class="xl80" style="text-align: center; ">384,94</td>, <td class="xl80" style="text-align: center; ">345,31</td>, <td class="xl80" style="text-align: center; ">322,95</td>, <td class="xl80" style="text-align: center; ">298,64</td>, <td class="xl80" style="text-align: center; ">272,17</td>, <td class="xl83" height="19"> <strong>abril</strong></td>, <td class="xl80" style="text-align: center; ">380,68</td>, <td class="xl80" style="text-align: center; ">343,24</td>, <td class="xl80" style="text-align: center; ">321,29</td>, <td class="xl80" style="text-align: center; ">296,93</td>, <td class="xl80" style="text-align: center; ">269,82</td>, <td class="xl83" height="19"> <strong>maio</strong></td>, <td class="xl80" style="text-align: center; ">376,43</td>, <td class="xl80" style="text-align: center; ">341,23</td>, <td class="xl80" style="text-align: center; ">319,71</td>, <td class="xl80" style="text-align: center; ">295,30</td>, <td class="xl80" style="text-align: center; ">267,80</td>, <td class="xl83" height="19"> <strong>junho</strong></td>, <td class="xl80" style="text-align: center; ">372,39</td>, <td class="xl80" style="text-align: center; ">339,25</td>, <td class="xl80" style="text-align: center; ">318,10</td>, <td class="xl80" style="text-align: center; ">293,70</td>, <td class="xl80" style="text-align: center; ">266,13</td>, <td class="xl83" height="19"> <strong>julho</strong></td>, <td class="xl80" style="text-align: center; ">368,37</td>, <td class="xl80" style="text-align: center; ">337,32</td>, <td class="xl80" style="text-align: center; ">316,50</td>, <td class="xl80" style="text-align: center; ">292,00</td>, <td class="xl80" style="text-align: center; ">264,47</td>, <td class="xl83" height="19"> <strong>agosto</strong></td>, <td class="xl80" style="text-align: center; ">364,53</td>, <td class="xl80" style="text-align: center; ">335,35</td>, <td class="xl80" style="text-align: center; ">314,91</td>, <td class="xl80" style="text-align: center; ">290,52</td>, <td class="xl80" style="text-align: center; ">262,90</td>, <td class="xl83" height="19"> <strong>setembro</strong></td>, <td class="xl80" style="text-align: center; ">361,21</td>, <td class="xl80" style="text-align: center; ">333,45</td>, <td class="xl80" style="text-align: center; ">313,32</td>, <td class="xl80" style="text-align: center; ">288,03</td>, <td class="xl80" style="text-align: center; ">261,41</td>, <td class="xl83" height="19"> <strong>outubro</strong></td>, <td class="xl80" style="text-align: center; ">358,12</td>, <td class="xl80" style="text-align: center; ">331,59</td>, <td class="xl80" style="text-align: center; ">311,65</td>, <td class="xl80" style="text-align: center; ">285,09</td>, <td class="xl80" style="text-align: center; ">260,03</td>, <td class="xl83" height="19"> <strong>novembro</strong></td>, <td class="xl80" style="text-align: center; ">355,24</td>, <td class="xl80" style="text-align: center; ">329,79</td>, <td class="xl80" style="text-align: center; ">308,61</td>, <td class="xl80" style="text-align: center; ">282,46</td>, <td class="xl80" style="text-align: center; ">258,64</td>, <td class="xl83" height="19"> <strong>dezembro</strong></td>, <td class="xl80" style="text-align: center; ">352,46</td>, <td class="xl80" style="text-align: center; ">327,99</td>, <td class="xl80" style="text-align: center; ">305,64</td>, <td class="xl80" style="text-align: center; ">280,06</td>, <td class="xl80" style="text-align: center; ">257,04</td>]
Process finished with exit code 0
Solution
If I understand you correctly you want to select value 1,00
from the table Taxa de Juros Selic Acumulada Mensalmente
:
import requests
from bs4 import BeautifulSoup
url = "https://www.gov.br/receitafederal/pt-br/assuntos/orientacao-tributaria/pagamentos-e-parcelamentos/taxa-de-juros-selic#Taxa_de_Juros_Selic"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
# select correct table:
table = soup.select_one("#Selicmensalmente").find_next("table")
# select actual row (that contains "maio")
current_row = soup.select_one("tr:-soup-contains(maio)")
# get all non-empty values:
values = [s for td in current_row if (s := td.get_text(strip=True))]
# print last one:
print(values[-1])
Prints:
1,00
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.