Issue
I'm having a problem with the Python code getting the value of a dynamic HTML tag. The function can be seen below. The tag value must be a number and not a dash. I've tried several approaches, including using selenium (sample code), but in none of them does the value of the tag, which can be seen directly on the website, appear.
Code:
from selenium import webdriver
import time
import bs4
def scrape_content_from_dynamic_websites():
url = "https://statusinvest.com.br/acoes/petr4/"
driver = webdriver.Chrome()
driver.get(url)
time.sleep(5)
html = driver.page_source
soup = bs4.BeautifulSoup(html, "html.parser")
all_strongs = soup.find_all("strong", {"data-item":"avg_F"})
driver.close()
return (all_strongs)
def main09():
print(scrape_content_from_dynamic_websites())
if __name__ == "__main__":
main09()
Result: [-]
Expected: [95,81%]
Solution
You could use driver.find_elements(By.XPATH, '//strong')
and then get the text from the elements found.
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.by import By
def scrape_content_from_dynamic_websites():
url = "https://statusinvest.com.br/acoes/petr4/"
driver = webdriver.Chrome()
driver.get(url)
sleep(5)
elms = driver.find_elements(By.XPATH, '//strong')
all_strongs = [elem.text for elem in elms]
driver.close()
return all_strongs
if __name__ == "__main__":
res = scrape_content_from_dynamic_websites()
print(res)
Outputs a list:
['', '', '', '36,10', '17,45', '37,16', '20,11', '106,52%', '', 'PETR4', 'IBOV', 'PN', '100 %', '1.670.598.995,66', 'IBOV', '', '7,189', 'OPÇÕES', '1.047', 'FORECAST', '', '', '', '', '', '', '', '', '', '', '', '', 'INDICADORES DE VALUATION', '20,11%', '3,44', '-0,15', '1,22', '2,65', '3,49', '1,72', '2,27', '29,59', '0,46', '10,50', '0,88', '-64,83', '-0,54', 'INDICADORES DE ENDIVIDAMENTO', '0,62', '0,87', '1,15', '0,38', '0,62', '0,95', 'INDICADORES DE EFICIÊNCIA', '51,08%', '50,94%', '38,69%', '25,53%', 'INDICADORES DE RENTABILIDADE', '35,47%', '13,35%', '21,54%', '0,52', 'INDICADORES DE CRESCIMENTO', '17,72%', '-%', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '15/12/2023', '0,03', '0,03', '14.729.310', '517.587.953', '2.704', '', '', '16,7786', '7,2539', '56,77%', '0,0000', '56,77%', '-', '-\n()', '-\n(-)', '()', '386.007.000.000', '1.025.496.000.000', '147.311.000.000', '305.451.000.000', '67.147.000.000', '238.304.000.000', '484.916.557.074', '723.220.557.074', '13.044.496.930', 'Nível 2', '63,39%', ' ', 'Petróleo. Gás e Biocombustíveis', 'Petróleo. Gás e Biocombustíveis', 'Exploração. Refino e Distribuição', 'InfoMoney', 'InfoMoney', 'InfoMoney', 'Suno Notícias', '+', '+', '17/05/2023', '867.547', '2.748', '5.807', '858.992', 'quantidade de investidores', 'data base', 'fazendo com que a quantidade fique bem baixa', 'até', 'até', 'até', '', 'Índice Brasil 50', '4.566.445.852', '8,004%', 'Índice MidLarge Cap', '4.566.445.852', '7,388%', 'Ibovespa', '4.566.445.852', '7,189%', 'Índice de Ações com Tag Along Diferenciado', '4.566.445.852', '7,163%', 'Índice Brasil 100', '4.566.445.852', '6,907%', 'Índice de Governança Corporativa Trade', '4.566.445.852', '6,796%', 'Índice Brasil Amplo', '4.566.445.852', '6,492%', 'Índice de Ações com Governança Corporativa Diferenciada', '6.849.668.778', '5,942%', 'Índice Dividendos', '816.746.192', '5,314%', 'Índice Carbono Eficiente', '1.614.506.055', '3,175%', '', '', '', '', 'APP', 'ASSINATURAS', 'BULL', 'FORECAST', 'STATUS INVEST', '', '', '', '']
But I would harden your XPATH to select only the data you are interested in
Answered By - Dan-Dev
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.