Issue
The Short:
How can I retrieve only tag names with .xpath() in Scrapy?
The Long:
I am currently using a scrapy.Spider and using response.selector.remove_namespaces()
in the parse()
function to keep things simple.
I am trying to do something like this, but with Scrapy:
Iterate on XML tags and get elements' xpath in Python
However, I can't seem to figure out how to retrieve only the name of the tags. What is the .xpath()
command to grab just the tag names?
Solution
There is no built in way of extracting just the tag name from a scrapy.selector
class, at least that I am aware of.
That being said, you can use the re
method of any selector and use a regular expression pattern to extract the tag name.
For example:
for selector in response.xpath("//*"):
print(selector.re(r'<(\w+)\s'))
Answered By - Alexander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.