Tuesday, January 30, 2024

[FIXED] How can I select a Specific list of tags when using Beautiful soup 4?

January 30, 2024 beautifulsoup, css, html, python, web-scraping No comments

Issue

I'm trying my hand at web scraping and I decided to make a script that takes the information from job listings from the python website. My code when it comes to scraping the information from a job listing works and the only thing that I have trouble solving is how I can iterate through the job listings. The jobs listings are enclosed in their < li > tags but there are also < li > tags at the top of the page. How can I iterate through the group of Tags that I want instead of the ones at the top.

I tried using a for loop to iterate through the < li > tags that I searched for using the find_all method

# 1. get website
url  = 'https://www.python.org/jobs/'
headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
    }
website = requests.get(url, headers=headers)

soup = bs4.BeautifulSoup(website.content, 'lxml')

jobs = soup.find_all('li')

counter = 0


for listing in jobs:

    # 2. get info
    job = listing.find('span', class_ = "listing-company-name")
    job_type = listing.find('span', class_ = "listing-job-type")
    job_location = listing.find('span', class_ = "listing-location")

    # 3. display info

    print(f'Job Title: {job.a.text}')
    print(f'Company: {job.br.next_sibling.strip()}')
    print(f'Skills Needed: {job_type.text}')
    print(f'Location: {job_location.text}')

The problem is that it returns None when I try to find the specific class because it searches the < li > tags at the top of the page which contain the titles and headers of the website itself.

Solution

You can do it this way, first find all the span tags with the class you need, and then select the desired text from it in a loop.

import bs4
import requests


# 1. get website
url  = 'https://www.python.org/jobs/'
headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
    }
website = requests.get(url, headers=headers)
print(website.status_code)

soup = bs4.BeautifulSoup(website.content, 'lxml')

jobs = soup.find_all('span', class_ = "listing-company-name")

job_text = [job.a.text for job in jobs]

print(f'Job Title: {job_text}')

The rest are similar.

Answered By - Сергей Кох

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, January 30, 2024

[FIXED] How can I select a Specific list of tags when using Beautiful soup 4?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels