Issue
I'm trying my hand at web scraping and I decided to make a script that takes the information from job listings from the python website. My code when it comes to scraping the information from a job listing works and the only thing that I have trouble solving is how I can iterate through the job listings. The jobs listings are enclosed in their < li > tags but there are also < li > tags at the top of the page. How can I iterate through the group of Tags that I want instead of the ones at the top.
I tried using a for loop to iterate through the < li > tags that I searched for using the find_all method
# 1. get website
url = 'https://www.python.org/jobs/'
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
}
website = requests.get(url, headers=headers)
soup = bs4.BeautifulSoup(website.content, 'lxml')
jobs = soup.find_all('li')
counter = 0
for listing in jobs:
# 2. get info
job = listing.find('span', class_ = "listing-company-name")
job_type = listing.find('span', class_ = "listing-job-type")
job_location = listing.find('span', class_ = "listing-location")
# 3. display info
print(f'Job Title: {job.a.text}')
print(f'Company: {job.br.next_sibling.strip()}')
print(f'Skills Needed: {job_type.text}')
print(f'Location: {job_location.text}')
The problem is that it returns None when I try to find the specific class because it searches the < li > tags at the top of the page which contain the titles and headers of the website itself.
Solution
You can do it this way, first find all the span
tags with the class you need, and then select the desired text from it in a loop.
import bs4
import requests
# 1. get website
url = 'https://www.python.org/jobs/'
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
}
website = requests.get(url, headers=headers)
print(website.status_code)
soup = bs4.BeautifulSoup(website.content, 'lxml')
jobs = soup.find_all('span', class_ = "listing-company-name")
job_text = [job.a.text for job in jobs]
print(f'Job Title: {job_text}')
The rest are similar.
Answered By - Сергей Кох
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.