Tuesday, October 4, 2022

[FIXED] Hello! Can anyone explain to me why my print is returning "None"?

October 04, 2022 beautifulsoup, python, web-scraping No comments

Issue

I'm practicing scraping with BeautifulSoup on a job page but my print is returning "None" for some odd reason, any ideas? Thanks in advance!

from bs4 import BeautifulSoup
import requests
import csv

url = 'https://jobgether.com/es/oferta/63083ece6d137a0ac6e701e6-part-time-business-psychologist-intern'
website = requests.get(url)
Soup = BeautifulSoup(website.content, 'html.parser')

Title = Soup.find('h5', class_="mb-0 p-2 w-100 bd-highlight fs-22")
print(Title)

Solution

That page is being hydrated with data via a javascript API: you can find that API by inspecting Dev tools - network tab, and you can see the information is being pulled as JSON from that API endpoint. This is one way to obtain thaat data, using requests:

import requests
import pandas as pd

headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}


url = 'https://filter-api.jobgether.com/api/offer/63083ece6d137a0ac6e701e6?%24populate%5B0%5D%5Bpath%5D=meta.continents&%24populate%5B0%5D%5Bselect%5D=name&%24populate%5B1%5D=meta.countries&%24populate%5B2%5D=meta.regions&%24populate%5B3%5D=meta.cities&%24populate%5B4%5D=meta.studiesArea&%24populate%5B5%5D=meta.salary&%24populate%5B6%5D=meta.languages&%24populate%5B7%5D=meta.hardSkills&%24populate%5B8%5D=meta.industries&%24populate%5B9%5D=meta.technologies&%24populate%5B10%5D%5Bpath%5D=company&%24populate%5B10%5D%5Bselect%5D=name%20meta.logo%20meta.industries%20meta.companyType%20meta.flexiblePolicy%20meta.employees%20meta.mainOfficeLocation%20meta.subOfficeLocation%20status%20description%20meta.mission%20meta.description%20meta.hardSkills%20meta.technologies%20meta.slug&%24populate%5B10%5D%5Bpopulate%5D%5B0%5D=meta.industries&%24populate%5B10%5D%5Bpopulate%5D%5B1%5D=meta.mainOfficeLocation&%24populate%5B10%5D%5Bpopulate%5D%5B2%5D=meta.subOfficeLocation'

r = requests.get(url, headers=headers)
obj = r.json()
print(obj['title'])
print(obj['meta']['apply_url'])
print(obj['meta']['countries'])
df = pd.json_normalize(obj['meta']['hardSkills'])
print(df)

This will display in terminal:

Part-Time Business Psychologist Intern
https://it.linkedin.com/jobs/view/externalApply/3221880417?url=https%3A%2F%2Fteamtailor%2Eassessfirst%2Ecom%2Fjobs%2F1462616-uk-part-time-business-psychologist-student-intern%3Fpromotion%3D464724-trackable-share-link-uk-business-psychologist-li&urlHash=dzk3&trk=public_jobs_apply-link-offsite
[{'_id': '622a65b4671f2c8b98fac83f', 'name': 'United Kingdom', 'alpha_code': 'GBR', 'continent': '622a659af0bac38678ed1398', 'geo': [-0.127758, 51.507351], 'name_es': 'Reino Unido', 'name_fr': 'Royaume-Uni', 'deleted_at': None, 'amount_of_use': 11407, 'alpha_2_code': 'GB'}]
_id id  name    name_es name_fr category_id status  createdAt   updatedAt   deletedAt   hard_skill_categories   hard_skill_category
0   623ca7112198fdff24e1a1b0    5   Design  Design  Design  1   1   0000-00-00 00:00:00 0000-00-00 00:00:00 None    Marketing   621d2a97058dc9445a92c4be
1   623ca7112198fdff24e1a249    173 Research    Investigación   Recherche   8   1   0000-00-00 00:00:00 0000-00-00 00:00:00 None    Business    621d2a97058dc9445a92c4c5
2   623ca7112198fdff24e1a24a    174 Science Ciencia Science 8   1   0000-00-00 00:00:00 0000-00-00 00:00:00 None    Business    621d2a97058dc9445a92c4c5
3   623ca7112198fdff24e1a292    1165    Customer Success    Customer Success    Customer Success    4   1   2021-07-07 10:53:19 2021-07-07 10:53:19 None    Sales   621d2a97058dc9445a92c4c1

You can print out the full json response, inspect it, dissect it and extract the relevant information from it (it's quite comprehensive). Relevant documentation for requests:

https://requests.readthedocs.io/en/latest/

And also, pandas documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html

Answered By - Barry the Platipus

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, October 4, 2022

[FIXED] Hello! Can anyone explain to me why my print is returning "None"?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels