Issue
all Here is some html string I got from website by ajax request
{
"data":{
label: 'description',
values: ['<p class="description">'
'someting'
'<br>'
'<br>'
'<b>mytitle_1</b>'
'<br>'
'<br>'
'something_1'
'<br>'
'<br>'
'<b>mytitle_2</b>'
'<br>'
'<br>'
'something_2'
'</p>']}
}
the value of the values key is html fragment, how can I get the all text inside the data["values"]. I'm using scrapy and is there any way to parse it by scrapy response get method?
Solution
Yes, you just need to extract the html content, cast it as a scrapy selector and use xpath('//text()').getall()
on it.
Example:
from scrapy.selector import Selector
resp_json = {
"data":{
'label': 'description',
'values': ['<p class="description">'
'someting'
'<br>'
'<br>'
'<b>mytitle_1</b>'
'<br>'
'<br>'
'something_1'
'<br>'
'<br>'
'<b>mytitle_2</b>'
'<br>'
'<br>'
'something_2'
'</p>']}
}
a = Selector(text=resp_json['data']['values'][0], type='html')
content = a.xpath('//text()').getall()
print(content)
Output:
['someting', 'mytitle_1', 'something_1', 'mytitle_2', 'something_2']
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.