Issue
I am learning to use scrapy and playing with XPath selectors, and decided to practice by scraping job titles from craigslist.
Here is the html of a single job link from the craigslist page I am trying to scrape the job titles from:
<a href="https://orangecounty.craigslist.org/sof/d/trabuco-canyon-full-stack-net-developer/7134827958.html" data-id="7134827958" class="result-title hdrlnk">Full Stack .NET C# Developer (Mid-Level, Senior) ***LOCAL ONLY***</a>
What I wanted to do was retrieve all of the similar a tags with the class result-title, so I used the XPath selector:
titles = response.xpath('//a[@class="result-title"/text()]').getall()
but the output I receive is an empty list: []
I was able to copy the XPath directly from Chrome's inspector, which ended up working perfectly and gave me a full list of job title names. This selector was:
titles = response.xpath('*//div[@id="sortable-results"]/ul/li/p/a/text()').getall()
I can see why this second XPath selector works, but I don't understand why my first attempt did not work. Can someone explain to me why my first XPath selector failed? I have also provided a link to the full html for the craigslist page below if that is helpful/neccessary. I am new to scrapy and want to learn from my mistakes. Thank you!
view-source:https://orangecounty.craigslist.org/search/sof
Solution
Like this:
'//a[contains(@class,"result-title ")]/text()'
Or:
'//a[starts-with(@class,"result-title ")]/text()'
I use contains()
or starts-with()
because the class of the a
node is
result-title hdrlnk
not just
result-title
In your XPath:
'//a[@class="result-title"/text()]'
even if the class was result-title
, the syntax is wrong, you should use:
'//a[@class="result-title"]/text()'
Answered By - Gilles Quenot
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.