Issue
I needed to parse a news site and the title and link to the news were in <main class='single-module__Main-sc-1qdjg1k-0 iMZnZU' . I tried to query subclasses but couldn't. soup.prettyfi() produced such a complete markup
<div class="single-module__Inner-sc-1qdjg1k-1 mIFFS">
<header class="sticky-header">
<div data-fusion-collection="features" data-fusion-message="Could not render component [features:global/main-navigation]" data-fusion-type="global/main-navigation" id="f0fj2XGPHYgA9Rb" style="display:none">
</div>
</header>
<main class="single-module__Main-sc-1qdjg1k-0 iMZnZU">
<div data-fusion-collection="features" data-fusion-message="Could not render component [features:global/search-page]" data-fusion-type="global/search-page" id="f0f1nZqKTkTE1lq" style="display:none">
</div>
</main>
<footer>
<div data-fusion-collection="features" data-fusion-message="Could not render component [features:global/footer]" data-fusion-type="global/footer" id="f0fUuwTNo76j9AD" style="display:none">
</div>
</footer>
I tried to see the subclasses but get the same result
q = soup.find('div', class_='layout-container').find('div','single-module__Inner-sc-1qdjg1k-1mIFFS')
print(q.find('main','single-module__Main-sc-1qdjg1k-0 iMZnZU')
Solution
Here is one way of getting those news -- they are being hydrated into page by a separate XHR call, so you can scrape that API endpoint directly:
import requests
import pandas as pd
headers= {
'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36'
}
url = 'https://api.queryly.com/json.aspx?queryly_key=d0ab87fd70264c0a&query=btc&batchsize=10&showfaceted=true&maxfacetitems=75&extendeddatafields=basic,creator,creator_slug,subheadlines,primary_section,report_url,section_path,sections_paths,subtype,type,imageresizer,section,sponsored_label,sponsored,promo_image,pubDate&timezoneoffset=-60'
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json(), record_path=['items'])
display(df[['section', 'subtype', 'title', 'pubdate', 'link', 'section_path', 'image']])
Result in terminal:
section subtype title pubdate link section_path image
0 Markets story_news First Mover Asia: Bitcoin Whales Are Increasin... Jul 07, 2023 /markets/2023/07/07/first-mover-asia-bitcoin-w... /markets https://cloudfront-us-east-1.images.arcpublish...
1 Markets story_news Bitcoin Drops Below $30K as Altcoins Tumble; B... Jun 28, 2023 /markets/2023/06/28/bitcoin-drops-below-30k-as... /markets https://cloudfront-us-east-1.images.arcpublish...
2 Policy story_news Bankrupt Celsius Can Convert Altcoins to BTC, ... Jun 30, 2023 /policy/2023/06/30/bankrupt-celsius-can-conver... /policy
3 Finance|Markets story_news Celsius to Potentially Sell More Than $170M in... Jun 30, 2023 /business/2023/06/30/celsius-to-potentially-se... /business https://cloudfront-us-east-1.images.arcpublish...
4 Markets story_news Bitcoin Tumbles on Report of SEC Saying Spot B... Jun 30, 2023 /markets/2023/06/30/bitcoin-tumbles-on-report-... /markets
5 Markets story_news First Mover Asia: Bitcoin Crosses $31K After S... Jul 04, 2023 /markets/2023/07/04/first-mover-asia-bitcoin-c... /markets https://cloudfront-us-east-1.images.arcpublish...
6 Mercados|Coindesk-ES story_news El gran repunte de bitcoin impulsado por los E... Jun 22, 2023 /es/markets/2023/06/22/el-gran-repunte-de-bitc... /es/markets
7 Markets story_news Bitcoin’s Ferocious ETF-Fueled Rally Puts Ethe... Jun 22, 2023 /markets/2023/06/22/bitcoins-ferocious-etf-fue... /markets
8 Finance story_news CME Group Announces Plans to Launch ETH to BTC... Jun 29, 2023 /business/2023/06/29/cme-group-announces-plans... /business
9 Markets story_news First Mover Asia: Bitcoin Holds Firm Above $30... Jun 22, 2023 /markets/2023/06/22/first-mover-asia-bitcoin-h... /markets
Answered By - Barry the Platipus
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.