Issue
I need to retrieve the price text from inside the custom-control / label / font style. The only way to identify which value the price belongs to is the data-number data-number="025.00286R"
. The letter at the end is the only element that differentiates the different control section divs.
<div class="custom-control custom-checkbox">
<input type="checkbox" class="custom-control-input" data-number="025.00286R" name="itemSelected[]" value="7684cd019b98489eb330010000039848" id="checkbox-7684cd019b98489eb330010000039848">
<label class="custom-control-label" for="checkbox-7684cd019b98489eb330010000039848">
<meta itemprop="price" content="676.0512">
<font style="vertical-align: inherit;"><font style="vertical-align: inherit;">
€676.05
</font></font>
</label>
</div>
I use this code to retrieve the total number of data-number
's within the page:
box_contents = response.css('div[class*="mad-article-list-box"]').re(r"[0-9]+.\d+[0-9][A-Z]+")
box_contents = list(dict.fromkeys(box_contents))
So that box contents are presented in a list (for each number in the list there is an identical custom control
class:
['025.00286GA', '025.00286GV', '025.00286NWA', '025.00286NW', '025.00286NWV', '025.00286R']
The problem now is that the <input type="checkbox"
does not contain any children divs, and I need the nested text contents of the div below it. <label class="custom-control-label"
I can locate the <input
with:
response.xpath('//input[contains(@data-number, "' + box_contents[0] + '")]')
However, now I need to either step up one in the xpath after locating the <input type="checkbox"
or step down 1 in the xpath. After that it is easy to extract all the nested text and the value that I am looking for €676.05
. How would I go about doing this? Is there a better way to accomplish this?
Solution
You could iterate through each of the div
elements with the custom-control
classes individually and pull the information for each checkbox and label one at a time, instead of gathering them all each at once. Then the two pieces of data will already be paired up since you will be iterating one pair at a time, and since you will be starting at a parent element to both data elements, finding the correct path to each element is more straightforward.
For example:
html = """
<div class="custom-control custom-checkbox">
<input type="checkbox" class="custom-control-input" data-number="025.00286R" name="itemSelected[]" value="7684cd019b98489eb330010000039848" id="checkbox-7684cd019b98489eb330010000039848">
<label class="custom-control-label" for="checkbox-7684cd019b98489eb330010000039848">
<meta itemprop="price" content="676.0512">
<font style="vertical-align: inherit;"><font style="vertical-align: inherit;">
€676.05
</font></font>
</label>
</div>
"""
import parsel
selector = parsel.Selector(html)
for control in selector.xpath("//div[@class='custom-control custom-checkbox']"):
data_number = control.xpath("./input/@data-number").get()
price = control.xpath(".//meta/@content").get()
print({"data_number": data_number, "price": price })
Output
{'data_number': '025.00286R', 'price': '676.0512'}
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.