Issue
I'm new to Python, so I'm confused on how to write the regex pattern to find the corresponding weights and quantities from the following list of strings.
This is what I have been doing so far.
import re
string1 = [' (Expiry Date: 30 May 2019) 4 x Organic Infant Goat Milk'
' Follow-on Formula 3 400g',
' (Expiry on 30 May 2019) 4 x Organic Infant Goat Milk'
' Follow-on Formula 2 400g ',
" [ Bellamy's ] Bellamys Organic Step 3 Toddler Milk Drink"
" 900g x 6 tins Made In Australia CARTON DEAL EXPIRE"
" 06/2019 to 2020",
' [[1+1]] FRISO (2) 1.8kg+900g',
" [[Carton Sales]] Bellamy's Organic Follow-On Formula"
" Step 2 900g x 6tins",
' Dumex Mamil Gold Stage 4 Growing Up Kid Milk Formula'
' (850g) x 6',
' Wyeth S-26 Promise Gold Stage 4 1.6kg X 6 Tins']
m = [re.search('([0-9.]+[kgG]{1,2})', s).group(0) for s in string1]
print m
My output is like this:
['400g', '400g', '900g', '1.8kg', '900g', '850g', '1.6kg']
But I would like to get this output:
['4x400g', '4x400g', '900gx6', '1.8kg+900g', '900gx6', '850gx6', '1.6kgX6']
Is there any way to get this?
Solution
It's better to normalize the quantity in the front:
m = ['x'.join(i for i in re.search(r'^(?=.*?(?:(\d+)\s*x\b|\bx\s*(\d+)))?(?=.*?((?:\b[0-9]+(?:\.[0-9]+)?(?:kg|g)\b\s*?\+?\s*?)+))', s, flags=re.IGNORECASE).groups() if i) for s in string1]
Given your sample input, m
would become:
['4x400g', '4x400g', '6x900g', '1.8kg+900g', '6x900g', '6x850g', '6x1.6kg']
Answered By - blhsing
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.