Issue
I am trying to web scrape this site link. The problem is that the page link remains the same even if I click the expand content button. I need to web scrape all of the news dating back to the first post.
`
import bs4, requests,
rom bs4 import BeautifulSoup
url = "https://www.internazionale.it/tag/la-settimana"
html = requests.get(url)
html.raise_for_status()
s = BeautifulSoup(html.text, 'html.parser')
results = s.find('div', class\_='hentryfeed__container container_full')
link_articolo = results.find_all('div', class\_='box-article-intro')
for articolo in link_articolo:
link_articoli = articolo.find('a', href=True)
print('https://www.internazionale.it' + link_articoli\['href'\])
This is the working code for page one, but the button that expand the content doesn't change the url code, so I need to find a new solution to web scrape all the news untill the first post
Solution
To get all the links you can use this example (emulating the Ajax call using requests
):
import requests
from bs4 import BeautifulSoup
url = "https://www.internazionale.it/tag/la-settimana"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
stream_id = soup.select_one("[data-stream-id]")["data-stream-id"]
# load first links
links = []
for article in soup.select(".box-article__data"):
links.append("https://www.internazionale.it" + article.a["href"])
data_datetime = article.find_previous(attrs={"data-datetime": True})[
"data-datetime"
].split()[0]
# load rest of the links
while True:
url = f"https://data.internazionale.it/stream_data/items/tag/0/{stream_id}/{data_datetime}.json"
data = requests.get(url).json()
if not data.get("items"):
break
for i in data["items"]:
links.append("https://www.internazionale.it" + i["url"])
print(links[-1])
data_datetime = data["datetime"].split()[0]
# `links` now contains all the links
Prints:
...
https://www.internazionale.it/opinione/giovanni-de-mauro/2001/08/02/la-battaglia-di-genova
https://www.internazionale.it/opinione/giovanni-de-mauro/2001/01/13/astroturf
https://www.internazionale.it/opinione/giovanni-de-mauro/1999/03/11/tutti-al-centro
https://www.internazionale.it/opinione/giovanni-de-mauro/1998/05/07/il-futuro-di-israele
https://www.internazionale.it/opinione/giovanni-de-mauro/1998/04/29/i-nuovi-vicini
https://www.internazionale.it/opinione/giovanni-de-mauro/1995/12/22/interviste
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.