Issue
title = data.xpath("//*[@id='jsheadline_989615']/span/text()").extract()
name = data.xpath("//*[@id='js_item_989615']/div[1]/div[2]/div[3]/strong[1]/text()")
.extract()
price = data.xpath("//*[@id='js_item_989615']/div[1]/div[2]/div[3]/strong[2]/text()")
.extract()
print title, name, price
For above code I want to write a regular expression for id
title = data.xpath("//*[@id='([jsheadline_]+\d{5}[0-9])']/span/text()").extract()
Is not giving any result to me. I am using xpath helper 2.0
on Chrome
Solution
Scrapy uses lxml
as xpath engine, you can register new namespaces in lxml
:
from lxml import etree
def register_xpath_namespaces():
fns = {
'date':'http://exslt.org/dates-and-times',
'dyn':'http://exslt.org/dynamic',
'exsl':'http://exslt.org/common',
'func':'http://exslt.org/functions',
'math':'http://exslt.org/math',
'random':'http://exslt.org/random',
're':'http://exslt.org/regular-expressions', # FOR REGEXP
'set':'http://exslt.org/sets',
'str':'http://exslt.org/strings'
}
for k,v in fns.iteritems():
etree.FunctionNamespace(v).prefix = k
register_xpath_namespaces()
Then you can get title via xpath:
title = data.xpath("//*[re:match(@id, '[0-9]+')]/span/text()").extract()
Note: Please test it yourself.
Answered By - kev
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.