Issue
i am having trouble saving down urls from a string.
i have tried something like this
url = "https://in.indeed.com/jobs?q=software%20engineer%20&l=Kerala"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
Links1 = soup.find_all("div",{"class:","pagination"})
url = [Links1.find(('a')['href'] for tag in Links1)]
WEbsite=f'https://in.indeed.com{url[0]}'
but its not returning full url list. I need url to navigate to next page .
Solution
Are you just after the "next page" or do you want all the links?
so do you want just:
/jobs?q=software+engineer+&l=Kerala&start=10
or are you after all of these?
/jobs?q=software+engineer+&l=Kerala&start=10
/jobs?q=software+engineer+&l=Kerala&start=20
/jobs?q=software+engineer+&l=Kerala&start=30
/jobs?q=software+engineer+&l=Kerala&start=40
/jobs?q=software+engineer+&l=Kerala&start=10
Few issues:
Links1
is a list of elements. And you are then using.find('a')
on a list, which won't work.- Since you want href attributes, consider using the
find('a',href=True)
So here's how I would go about it:
import requests
from bs4 import BeautifulSoup
url = "https://in.indeed.com/jobs?q=software%20engineer%20&l=Kerala"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
Links1 = soup.find_all("div",{"class":"pagination"})
url = [tag.find('a',href=True)['href'] for tag in Links1]
website=f'https://in.indeed.com{url[0]}'
Output:
print(website)
https://in.indeed.com/jobs?q=software+engineer+&l=Kerala&start=10
To get all those links:
import requests
from bs4 import BeautifulSoup
url = "https://in.indeed.com/jobs?q=software%20engineer%20&l=Kerala"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
Links1 = soup.find("div",{"class":"pagination"})
urls = [tag['href'] for tag in Links1.find_all('a',href=True)]
website=f'https://in.indeed.com{url[0]}'
Answered By - chitown88
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.