Saturday, September 24, 2022

[FIXED] Beautifulsoup : href link is undifined

September 24, 2022 beautifulsoup, python, selenium, web-scraping No comments

Issue

I want to scrap a website, when I reach any tag the link is "job/undifined" , I used post request to fetch data from the page :

post request with postdata in this code :

from bs4 import BeautifulSoup
import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}

postData = {
 'search': 'search',
 'facets[camp_type]':'day_camp',
 'open[choices-made-content]': 'true'}

url = 'https://www.trustme.work/en'
html_1 = requests.post(url, headers=headers, data=postData)

soup1 = BeautifulSoup(html_1.text, 'lxml')
a = soup1.select('div.MuiGrid-root MuiGrid-grid-xs-12 ')
b = soup1.select('span[class="MuiTypography-root MuiTypography-h2"]')
print('soup:',b)

sample from the output :

<span class="MuiTypography-root MuiTypography-h2" style="cursor:pointer">
    <a href="job/undefined" style="color:#413E52;text-decoration:none">
    Network and Security engineer
    </a>
</span>

Solution

EDIT

Part of content is served dynamically so, you have to fetch the jobs hashid via api and then create the link yourself or use the data from JSON response:

import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}
url = 'https://api.trustme.work/api/job_offers?include=technologies%2Cjob%2Ccompany%2Ccontract_type%2Clevel'
jobs = requests.get(url, headers=headers).json()['included']['jobs']

['https://www.trustme.work/job/' + v['hashid'] for k,v in jobs.items()]

To get the links from each job post change your css selector to select your elements more specific, also try to use static identifiers or HTML structure over classes:

.select('h2 a')

To get a list of all links use a list comprehension:

['https://www.trustme.work' + a.get('href') for a in soup1.select('h2 a')]

Example

from bs4 import BeautifulSoup
import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}

postData = {
 'search': 'search',
 'facets[camp_type]':'day_camp',
 'open[choices-made-content]': 'true'}

url = 'https://www.trustme.work/en'
html_1 = requests.post(url, headers=headers, data=postData)

soup1 = BeautifulSoup(html_1.text, 'lxml')
['https://www.trustme.work' + a.get('href') for a in soup1.select('h2 a')]

Answered By - HedgeHog

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, September 24, 2022

[FIXED] Beautifulsoup : href link is undifined

Issue

Solution

EDIT

Example

0 comments:

Post a Comment

Popular Posts

Labels