Issue
How to fix this regex so that with these input strings I get these outputs...
out = re.sub(r"(hs|h.s|h.s.)a m(\W|\b)", r"\1 am\2", out)
print(repr(out))
Input string examples...
#example 1.1
colloquial_hour = "Cerca de las 2: hs a m, hay que salir antes de esas hs a m"
#example 1.2
colloquial_hour = "A medida que avance cerca de la media noche 12: 04 hs a m. Deben ir a las 15 hs a m."
#example 1.3
colloquial_hour = "A mmm... cerca de las 12: h.s a m, hay que salir antes de esas h.s. a m"
#example 1.4
colloquial_hour = "A medida que avance cerca de las 12:04 hs. a m. Deben ir a las 15 h.s a m."
correct outputs:
#correct output for example 1.1
"Cerca de las 2: hs am, hay que salir antes de esas hs a m"
#correct output for example 1.2
"A medida que avance cerca de la media noche 12: 04 hs am. Deben ir a las 15 hs am."
#correct output for example 1.3
"A mmm... cerca de las 12: h.s am, hay que salir antes de esas h.s. a m"
#correct output for example 1.4
"A medida que avance cerca de las 12:04 hs. am. Deben ir a las 15 h.s am."
The logic should work that su will do a numeric value and then an "a m"
replace that "a m"
substring with this string "am"
in the original string.
These would be all the possible cases where you have to replace the substring "a m" with "am"
X a m
X: a m
X: hs a m
X: h.s. a m
X: h.s a m
X: hs. a m
X: a m
X : hs a m
X : h.s. a m
X : h.s a m
X : hs. a m
X hs a m
X h.s. a m
X h.s a m
X hs. a m
#where "X" is a numerical value ("1", "2", "3", "4", "5", "6", ... )
#in all these cases, in which this pattern is met, "a m" must be replaced by "am"
Solution
You can search using regex:
(\d\W+)(h\.?s\.?\s+)?a\s+m\b
and replace using:
\1\2am
RegEx Details:
(\d\W+)
: Match a digit followed by 1+ non-word char in capture group #1(h\.?s\.?\s+)?
: Matchh
followed bys
with optional dots after them. This optional group is capture group #2a\s+m\b
: Matcha
followed by 1+ whitespaces thenm
with a word boundary
Answered By - anubhava
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.