Issue
I'm scraping this page for jobs. However, I'm having some trouble properly getting and scraping an element describing job positions (Analyst, Vice President, Associate, etc.).
To get to the job card, I'm using full XPATH (which already looks messy but I have no idea how else to do it). The code is:
checked = wait.until(
EC.presence_of_all_elements_located(
(By.XPATH, '//*[@id="__next"]/main/div/div[2]/div/div/div[2]/div/div[2]/div/div/div[2]/div'))
)
And then to get the text inside these elements:
What should I do? I tried using XPATH again, but it just gets the first element's info, and repeats. For example:
jobs = []
for row in checked:
jobs.append({
'info': row.find_element(By.XPATH, '//*[@id="__next"]/main/div/div/div/div/div/div/div[2]/div/div/div[2]/div[5]/div/div[1]/a/div/div[2]/span[2]').text
})
print(jobs)
Just gives me this result:
[{'info': 'Analyst'}, {'info': 'Analyst'}, {'info': 'Analyst'}, {'info': 'Analyst'}, {'info': 'Analyst'}, {'info': 'Analyst'}, {'info': 'Analyst'}, {'info': 'Analyst'}, {'info': 'Analyst'}, {'info': 'Analyst'}]
How do I properly get the XPATH for all these elements? What's a more cleaner approach? I tried By.CSS_SELECTOR
, too. But the elements are so nested with generic classes which repeat, that I have no clue how to approach this.
Thank you!
Solution
Locator of position has several levels.
1st level is job card that can be located by attribute //*[@data-gs-uitk-component='card-body']
2nd level is element //*[contains(@class,'align-items-center')
that contains child .//*[@data-type='custom']
.
3rd level is needed element that data-gs-uitk-component
attribute equals text
.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://higher.gs.com/results?&page=1&sort=RELEVANCE")
wait = WebDriverWait(driver, 20)
job_cards = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@data-gs-uitk-component='card-body']//*[contains(@class,'align-items-center') and .//*[@data-type='custom']]//*[@data-gs-uitk-component='text']")))
for card in job_cards:
print(card.text)
Answered By - Yaroslavm
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.