Issue
I have some address info. stored in a pandas df column like below:
df['Addr']
LT 75 CEDAR WOOD 3RD PL
LTS 22,25 & 26 MULLINS CORNER
LTS 7 & 8
PT LT 22-23 JEFFERSON HIGHLANDS EXTENSION
I want to extract lot information and create a new column so for the example above, my expected results are as below:
df['Lot']
75
22,25 & 26
7 & 8
22-23
This is my code:
df['Lot'] = df['Addr'].str.extract(r'\b(?:LOT|LT|LTS?) (\w+(?:-\d+)*)')
Results I'm getting is:
75
22
7
22-23
How can I modify my regex for expected results if at all possible? Please advise.
Solution
You could use
\b(?:LOT|LTS?) (\d+(?:(?:[-,]| & )\d+)*)
Explanation
\b
A word boundary(?:LOT|LTS?)
MatchLOT
orLT
orLTS
(
Capture group 1\d+
Match 1+ digits(?:(?:[-,]| & )\d+)*
Optionally repeat either-
or,
or&
followed by 1+ digits
)
Close group 1
data = [
"LT 75 CEDAR WOOD 3RD PL",
"LTS 22,25 & 26 MULLINS CORNER",
"LTS 7 & 8",
"PT LT 22-23 JEFFERSON HIGHLANDS EXTENSION"
]
df = pd.DataFrame(data, columns = ['Addr'])
df['Lot'] = df['Addr'].str.extract(r'\b(?:LO?T|LTS?) (\d+(?:(?:[-,]| & )\d+)*)')
print(df)
Output
Addr Lot
0 LT 75 CEDAR WOOD 3RD PL 75
1 LTS 22,25 & 26 MULLINS CORNER 22,25 & 26
2 LTS 7 & 8 7 & 8
3 PT LT 22-23 JEFFERSON HIGHLANDS EXTENSION 22-23
If the -
,
and &
can all be surrounded by optional whitespace chars, you might shorten the pattern to:
\b(?:LOT|LTS?) (\d+(?:\s*[-,&]\s*\d+)*)\b
Answered By - The fourth bird
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.