Issue
I would like to know to select data from a specific tags structure out of multiple similar tags structure. For example, consider below structures. Now i would like to only select data out of first "vcol" so that i can read ITEM1, ITEM2, ITEM3, ITEM4.
<div class="main">
<div class="vcol">
<div class="cls">
<ul class="ul_style">
<li class="li_style"> <h3 class="item"> ITEM1 </h3></li>
<li class="li_style"> <h3 class="item"> ITEM2 </h3></li>
<li class="li_style"> <h3 class="item"> ITEM3 </h3></li>
<li class="li_style"> <h3 class="item"> ITEM4 </h3></li>
</ul>
</div>
</div>
<div class="vcol">
<div class="cls">
<ul class="ul_style">
<li class="li_style"> <h3 class="item"> ITEM5 </h3></li>
<li class="li_style"> <h3 class="item"> ITEM6 </h3></li>
<li class="li_style"> <h3 class="item"> ITEM7 </h3></li>
<li class="li_style"> <h3 class="item"> ITEM8 </h3></li>
</ul>
</div>
</div>
<div class="vcol">
<div class="cls">
<ul class="ul_style">
<li class="li_style"> <h3 class="item"> ITEM9 </h3></li>
<li class="li_style"> <h3 class="item"> ITEM10 </h3></li>
<li class="li_style"> <h3 class="item"> ITEM11 </h3></li>
<li class="li_style"> <h3 class="item"> ITEM12 </h3></li>
</ul>
</div>
</div>
</div>
If i write scrapy code as below, i am getting all ITEMS1 - 12.
for item in response.css('.vcol .cls .ul_style li'):
item.css('h2 ::text').extract_first()
Any suggestion, how to get only ITEM1-4 ?
Tried to loop through different classes, however i am always getting all items
Solution
As mentionned in my comments, you can use the
:nth-child() CSS selector
to only match the first <div class="vcol">
.
Then, I don't think it's worth doing two CSS queries as you effectively have
one single ITEM* per <li>
. So you could use .vcol:nth-child(1) h3
or .vcol:nth-child(1) .item
to select the HTML tags you want to extract.
Answered By - Patrick Janser
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.