Tuesday, November 14, 2023

[FIXED] How to match an entire string using Regexes in Python?

November 14, 2023 list, python-3.x, python-re, regex No comments

Issue

I'm trying to build a regex pattern in Python that will match strings like these:

"THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)", "VEHICLE - STOLEN", "TRANSPORTATION FACILITY (AIRPORT)", "5600 N FIGUEROA" and "400 WORLD WY" ST.

import re

hello = {"meta": 1, "reza": [[ "row-f696.af3d.c3v9", "00000000-0000-0000-2D2F-EA38F9F11DB9", 0, 1642111191, 1642111191, "{ }", "201412343", "2020-06-15T00:00:00", "2020-06-15T00:00:00", "0700", "14", "Pacific", "1494", "1", "331", "THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)", "1606 0344 1300 1402", "60", "F", "W", "212", "TRANSPORTATION FACILITY (AIRPORT)", "IC", "Invest Cont", "331", "998", "400    WORLD                        WY", "33.9433", "-118.4072" ] ,
        [ "row-f2wh.yte2-zhv8", "00000000-0000-0000-0BF4-2A6281C66DEF", 0, 1636553859, 1636553859, "{ }", "201107194", "2020-03-11T00:00:00", "2020-03-11T00:00:00", "1100", "11", "Northeast", "1118", "1", "510", "VEHICLE - STOLEN", "0", "108", "PARKING LOT", "IC", "Invest Cont", "510", "5600 N  FIGUEROA                     ST", "34.114", "-118.1949" ]]}
crime = []
for items in hello["reza"]:
    for item in items:
        pattern = re.compile(r'[A-Z].*')
        crime = re.findall(pattern,str(item))

print(crime)

Solution

The most obvious problem in your code is that you're overwriting crime at each iteration of your nested loop. You will therefore print the result of the last findall call. Since findall returns a list (of all matches in str(item)) you end up with an empty list (since there is no match in your last item).

Furthermore, you didn't described how you want to filter your results. Your pattern [A-Z].* will match strings starting with an uppercase letter but it will obviously exclude 5600 N FIGUEROA.

Here a suggestion checking for strings with at least three uppercase letters following each other and not starting with digits directly followed by - (also replacing multiple whitespaces with a single one):

import re

hello = {"meta": 1, "reza": [[ "row-f696.af3d.c3v9", "00000000-0000-0000-2D2F-EA38F9F11DB9", 0, 1642111191, 1642111191, "{ }", "201412343", "2020-06-15T00:00:00", "2020-06-15T00:00:00", "0700", "14", "Pacific", "1494", "1", "331", "THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)", "1606 0344 1300 1402", "60", "F", "W", "212", "TRANSPORTATION FACILITY (AIRPORT)", "IC", "Invest Cont", "331", "998", "400    WORLD                        WY", "33.9433", "-118.4072" ] ,
        [ "row-f2wh.yte2-zhv8", "00000000-0000-0000-0BF4-2A6281C66DEF", 0, 1636553859, 1636553859, "{ }", "201107194", "2020-03-11T00:00:00", "2020-03-11T00:00:00", "1100", "11", "Northeast", "1118", "1", "510", "VEHICLE - STOLEN", "0", "108", "PARKING LOT", "IC", "Invest Cont", "510", "5600 N  FIGUEROA                     ST", "34.114", "-118.1949" ]]}
crime = []
pattern = re.compile(r'(?!\d+-).*[A-Z]{3,}')
for items in hello["reza"]:
    for item in items:
        if isinstance(item, str) and re.match(pattern, item):
            crime.append(re.sub(r'\s+', ' ', item))

print(crime)

Output:

['THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)', 'TRANSPORTATION FACILITY (AIRPORT)', '400 WORLD WY', 'VEHICLE - STOLEN', 'PARKING LOT', '5600 N FIGUEROA ST']

Answered By - Tranbi

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 14, 2023

[FIXED] How to match an entire string using Regexes in Python?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels