Issue
I'm using Scrapy and i'm trying to scrape something like this:
<html>
<div class='hello'>
some elements
.
.
.
</div>
<div class='hi there'>
<div>
<h3> title </h3>
<h4> another title </h4>
<p> some text ..... </p>
"some text without any tag"
<div class='article'>
some elements
.
.
</div>
<div class='article'>
some elements
.
.
</div>
<div class='article'>
some elements
.
.
</div>
</div>
</div>
</html>
and if I want to extract the text from all elements under the div with class name 'hi there' and before the divs with class name 'article', is there any possible way wither with XPath or CSS selectors?
Solution
Never used Scrapy.
Have no idea of what functions it has but,
//div[@class='hi there']/div/(div[@class='article'])[1]/preceding-sibling::*
picks out elements before div with "article" class and,
//div[@class='hi there']/div/(div[@class='article'])[1]/preceding-sibling::text()
gives you inner texts before article div.
Answered By - Joonyoung Park
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.