Issue
I want to fetch all links from the link given in code and particularly this https://api.somthing.com/v1/companies/ link. All the regex which I found online is only fetching simple links like https://api.somthing.com
import requests
import re
from bs4 import BeautifulSoup
url='https://www.linkdin.com/'
x = requests.get(url)
html_doc=x.text
soup = BeautifulSoup(html_doc,"html.parser" )
print(soup)
Solution
You can findall
the urls directly from the response content :
p= r'https://api\.something\.com/.*?(?=")'
urls = re.findall(p, html_doc)
Output :
['https://api.something.com/v1/companies/postings/733260034',
'https://api.something.com/v1/companies/postings/371262356',
'https://api.something.com/v1/companies/postings/465637233',
'https://api.something.com/v1/companies/postings/315747724,
...
Answered By - Timeless
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.