Issue
I am extracting data from a html page with xpath and want to retrieve a specific information located in a text part.
>>> response.xpath('//*[@id="productDetails"]/div[1]/div[2]/div[2]/text()').extract()
['\nInhalt: 10 Stück', '\nGrundpreis: 1 Stück 0,14 €']
This returns me the wanted info within a dictionary, alongside with several other info.
Now I try to grab the wanted info via regex, as I do not know the position, it changes. So I filter:
>>> r = re.compile('.*Grundpreis.*')
>>> newlist = list(filter(r.match, data))
Somehow this does not work and returns an empty list:
>>> newlist
[]
I followed examples found on SO which worked but this one does not. The only difference I could find is, that my example is single quotes instead of double quotes. But this seems not to be possible to change with the xpath command.
How can I extract the wanted information "Grundpreis:..." without a key index?
Solution
From the list of special characters in the syntax section of the re
docs:
.
(Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.
The Pattern.match()
method:
If zero or more characters at the beginning of string match this regular expression, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.
Since the pattern .*Grundpreis.*
will not match the first character of the string "\nGrundpreis: 1 Stück 0,14 €"
, as it is a newline, match()
returns None
.
Here is an example which should give you some inspiration for what you're trying to do:
import re
patt = re.compile(r"Grundpreis: (.*)")
test_strs = ['\nInhalt: 10 Stück', '\nGrundpreis: 1 Stück 0,14 €']
for elem in test_strs:
res = patt.search(elem)
if res:
print(f"Match found in string: {elem}. Match: {res}. Group: {res.group(1)}")
else:
print(f"No match in string: {elem}")
Output:
No match in string:
Inhalt: 10 Stück
Match found in string:
Grundpreis: 1 Stück 0,14 €. Match: <re.Match object; span=(1, 27), match='Grundpreis: 1 Stück 0,14 €'>. Group: 1 Stück 0,14 €
Answered By - AMC
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.