Issue
The following code contains tools that basically parses the first page. It gets all the articles but it includes a link to the next page.
if we see the structure of this website, we can see the link to the next page is something like https://slow-communication.jp/news/?pg=2
.
import re
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
main_url = 'https://slow-communication.jp'
req = Request(main_url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, "lxml")
for link in soup.findAll('a'):
_link = str(link.get('href'))
if '/news/' in _link:
artice_id = _link.split("/news/")[-1]
if len(artice_id) > 0:
print(_link)
Using this code, I get
https://slow-communication.jp/news/3589/
https://slow-communication.jp/news/3575/
https://slow-communication.jp/news/3546/
https://slow-communication.jp/news/?pg=2
But what I would like to do is to keep every link to the articles and keep going to the next pages. So I would keep
https://slow-communication.jp/news/3589/
https://slow-communication.jp/news/3575/
https://slow-communication.jp/news/3546/
and then go to https://slow-communication.jp/news/?pg=2
and keep doing the same thing until the website has not more next page.
How do I do that?
Solution
You can make pagination using for loop and range function along with format method
which type of pagination is 2 times faster than others.You can increase or decrease page numbers whatever you want.
import re
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
main_url = 'https://slow-communication.jp/news/?pg={page}'
for page in range(1,11):
req = Request(main_url.format(page=page), headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, "lxml")
for link in soup.findAll('a'):
_link = str(link.get('href'))
if '/news/' in _link:
artice_id = _link.split("/news/")[-1]
if len(artice_id) > 0:
print(_link)
Answered By - F.Hoque
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.