Issue
I am trying to extract attributes from a website with scrapy and xpath:
response.xpath('//section[@id="attributes"]/div/table/tbody/tr/td/text()').extract()
The attributes are nested in the following way:
<section id="attributes">
<h5>Attributes</h5>
<div>
<table>
<tbody>
<tr>
<td>Attribute 1</td>
<td>Value 1</td>
</tr>
<tr>
<td>Attriburte 2</td>
<td>Value 2</td>
</tr>
There are two problems associated with this:
- Get the content of the td elements (the XPath command will return[])
- Once the
td
is retrieved, I need to get the pairing somehow. e.g.: "Attribute 1" = "Value 1"
I am new to phyton and scrapy, any help is greatly appreciated.
Solution
First of all you should try to remove tbody
tag from XPath as usually it's not in page source.
You can update your code as below:
cells = response.xpath('//section[@id="attributes"]/div/table//tr/td/text()').extract()
att_values = [{first: second} for first, second in zip(cells[::2], cells[1::2])]
You will get list of attribute-value pairs:
[{attr_1: value_1}, {attr_2: value_2}, {attr_3: value_3}, ...]
or
att_values = {first: second for first, second in zip(cells[::2], cells[1::2])}
# or:
# att_values = dict( zip(cells[::2], cells[1::2]) )
to get dictionary
{attr_1: value_1, attr_2: value_2, attr_3: value_3, ...}
Answered By - JaSON
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.