Issue
I started programming not long ago and came across this problem. I want to collect stock data from the website: https://statusinvest.com.br/acoes/petr4. But apparently they are rendered with javascript and BeautifulSoup does not collect, if you can help I appreciate it
My soup code Example of information loaded with javascript
Solution
This section not only requires js to load, it actually will not load until you scroll to it. You could try to figure out which request and/or bit of js was made to render that section and then attempt to replicate it with python, but I think it would be easier to use selenium. I even have this function for making it more convenient to automate some of the simpler/common interactions before scraping the html:
#### FIRST PASTE [or DOWNLOAD&IMPORT] FUNCTION DEF from https://pastebin.com/kEC9gPC8 ####
soup = linkToSoup_selenium(
'https://statusinvest.com.br/acoes/petr4',
clickFirst='//strong[@data-item="avg_F"]' # it actually just has to scroll, not click [but I haven't added an option for that yet],
ecx='//strong[@data-item="avg_F"][text()!="-"]' # waits till this loads
)
if soup is not None:
print({
t.find_previous_sibling().get_text(' ').strip(): t.get_text(' ').strip()
for t in soup.select('div#payout-section span.title + strong.value')
})
prints
{'MÉDIA': '83,32%', 'ATUAL': '124,13% \n ( 48,97% acima da média )', 'MENOR\xa0VALOR': '26,35% \n ( 2019 )', 'MAIOR\xa0VALOR': '144,51% \n \n( 2020 )'}
EDIT: I ended up noticing the API used for fetching the data after all (https://statusinvest.com.br/acao/payoutresult?code=petr4&companyid=408&type=0). You can actually reform it even with html available before the js-loading happens:
soup.select_one('#payout-section[data-company][data-code]').attrs
should return
{'id': 'payout-section', 'data-company': '408', 'data-code': 'petr4', 'data-category': '1'}
so then the url can be formed with
payout = soup.select_one('#payout-section[data-company][data-code]')
if payout:
compId, dCode = payout.get('data-company'), payout.get('data-code')
apiUrl = f'https://statusinvest.com.br/acao'
apiUrl = f'{apiUrl}/payoutresult?code={dCode}&companyid={compId}&type=0'
[I think the type
param is for the time window - 0 for 5yrs, 1 for 10yrs, and 2 for max window.] requests.get(apiUrl, headers=headers).json()
should return something like
{
"actual": 124.12623323305537,
"avg": 83.32096287339556,
"avgDifference": 48.97359434223362,
"minValue": 26.353309862919502,
"minValueRank": 2019,
"maxValue": 144.51093035368598,
"maxValueRank": 2020,
"actual_F": "124,13%",
"avg_F": "83,32%",
"avgDifference_F": "48,97% acima da m\u00e9dia",
"minValue_F": "26,35%",
"minValueRank_F": "2019",
"maxValue_F": "144,51%",
"maxValueRank_F": "2020",
"chart": {
"categoryUnique": true,
"category": [
"2018",
"2019",
"2020",
"2021",
"2022"
],
"series": {
"percentual": [
{
"value": 27.189302754606462,
"value_F": "27,19%"
},
{
"value": 26.353309862919502,
"value_F": "26,35%"
},
{
"value": 144.51093035368598,
"value_F": "144,51%"
},
{
"value": 94.42503816271046,
"value_F": "94,43%"
},
{
"value": 124.12623323305537,
"value_F": "124,13%"
}
],
"proventos": [
{
"value": 7009130357.11,
"value_F": "R$ 7.009.130.357,11",
"valueSmall_F": "7,01 B"
},
{
"value": 10577427979.68,
"value_F": "R$ 10.577.427.979,68",
"valueSmall_F": "10,58 B"
},
{
"value": 10271836929.54,
"value_F": "R$ 10.271.836.929,54",
"valueSmall_F": "10,27 B"
},
{
"value": 100721299707.4,
"value_F": "R$ 100.721.299.707,40",
"valueSmall_F": "100,72 B"
},
{
"value": 179966901777.61,
"value_F": "R$ 179.966.901.777,61",
"valueSmall_F": "179,97 B"
}
],
"lucroLiquido": [
{
"value": 25779000000.0,
"value_F": "R$ 25.779.000.000,00",
"valueSmall_F": "25,78 B"
},
{
"value": 40137000000.0,
"value_F": "R$ 40.137.000.000,00",
"valueSmall_F": "40,14 B"
},
{
"value": 7108000000.0,
"value_F": "R$ 7.108.000.000,00",
"valueSmall_F": "7,11 B"
},
{
"value": 106668000000.0,
"value_F": "R$ 106.668.000.000,00",
"valueSmall_F": "106,67 B"
},
{
"value": 144987000000.0,
"value_F": "R$ 144.987.000.000,00",
"valueSmall_F": "144,99 B"
}
]
}
}
}
and then you can get the values you want from there. (I think it includes the chart data as well.)
Answered By - Driftr95
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.