Issue
I am scraping data using CSS selector for the first time.
And there is a problem scraping content of anchor.
Here is my code:
import requests
from bs4 import BeautifulSoup
url = "https://weworkremotely.com/remote-jobs/search?utf8=✓&term=ruby"
wwr_result = requests.get(url)
wwr_soup = BeautifulSoup(wwr_result.text, "html.parser")
posts = wwr_soup.find_all("li", {"class": "feature"})
link = post.select("#category-2 > article > ul > li:nth-child(1) > a[href]")
title = post.find("span", {"class": "title"}).get_text()
company = post.find("span", {"class": "company"}).get_text()
location = post.find("span", {"class": "region company"}).get_text()
link = post.select("#category-2 > article > ul > li:nth-child(1) > a[href]")
print {"title": title, "company": company, "location": location, "link":f"https://weworkremotely.com/{link}"}
I want to scrape the content of anchor to make a link of each post. So I put a[href]
.
But it doesn't work but scrape contents of all subcategory.
What do I have to change to scrape just the content of anchor?
Solution
Assuming you have correctly selected the jobs of interest from all jobs listed, you need a loop, then extract the first href attribute with substring -jobs
i.e. post.select_one('[href*=-jobs]'
during the loop:
import requests
from bs4 import BeautifulSoup
url = "https://weworkremotely.com/remote-jobs/search?utf8=✓&term=ruby"
wwr_result = requests.get(url)
wwr_soup = BeautifulSoup(wwr_result.text, "html.parser")
posts = wwr_soup.find_all("li", {"class": "feature"})
for post in posts:
print('https://weworkremotely.com' + post.select_one('a[href*=-jobs]')['href'])
To get all the listings on the page switch to:
posts = wwr_soup.select('li:has(.tooltip)')
Answered By - QHarr
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.