Issue
These are the input string examples:
#example 1.1
colloquial_hour = "Hola nos vemos a las diez y veinte a m, ten en cuenta que al amanecer tendremos que estar despiertos, porque debemos estar alli a eso de nueve a m o las diez y cuarto a m"
#example 1.2
colloquial_hour = "A mi me parece entre las 10 15 am y las 11 a m, o a las 15 a m aunque quizas a medianoche este bien a eso de las 00:00 a m"
#example 1.3
colloquial_hour = "Puede que a las 10 am. Hay 10 a medias, a m mmm... creo que en 10 estarian para terminar a las 11:00 hs a m 11:59 a m"
#example 1.4
colloquial_hour = "Amediados a mediados del 30 antes de y dia; me parace que hay que estar en casa. Medianamente a, mediados de las 05 a m o cerca de 6 a m."
I have tried with a simple replacement, but I think that the cases must be further restricted with a regex pattern so that unwanted replacements are not made...
colloquial_hour = colloquial_hour.replace('a m', 'am ')
, and to be able to obtain this string as output...
the correct output for each of these examples:
#example 1.1
colloquial_hour = "Hola nos vemos a las diez y veinte am, ten en cuenta que al amanecer tendremos que estar despiertos, porque debemos estar alli a eso de nueve am o las diez y cuarto am"
#example 1.2
colloquial_hour = "A mi me parece entre las 10 15 am y las 11 am, o a las 15 am aunque quizas a medianoche este bien a eso de las 00:00 am"
#example 1.3
colloquial_hour = "Puede que a las 10 am. Hay 10 a medias, a m mmm... creo que en 10 estarian para terminar a las 11:00 hs am 11:59 am"
#example 1.4
colloquial_hour = "Amediados a mediados del 30 antes de y dia; me parace que hay que estar en casa. Medianamente a, mediados de las 05 am o cerca de 6 am."
In this case, the pseudo-pattern is: some number "a m" to replace with the string "am" one or more empty spaces, a period, a comma or directly the end of the string
Cases should also be considered where there may be incompletely written schedules where "am" would be preceded by ":"
, " :"
, ": "
, " hs"
, "hs"
, "hs "
, " h.s. "
, "h.s."
, "h.s. "
, " h.s"
, "h.s"
or "h.s "
, for example,
input_t = "a las 12: a m"
output = "a las 12: am"
input_t = "a las 12 : a m"
output = "a las 12 : am"
input_t = "a las 12 hs a m"
output = "a las 12 hs am"
input_t = "a las 12:hs a m"
output = "a las 12:hs am"
input_t = "a las 12: hs a m"
output = "a las 12: hs am"
input_t = "a las 12hsa m"
output = "a las 12hs am"
input_t = "a las 12h.sa m"
output = "a las 12h.s am"
input_t = "a las 12 h.sa m"
output = "a las 12 h.s am"
input_t = "a las 12 h.s.a m"
output = "a las 12 h.s. am"
Solution
For the first part I made this regex:
out = re.sub(r"([0-9][0-9]\W)a m(\W|\b)", r"\1am\2", colloquial_hour)
It change the "a m" for "am" keeping whatever was before and after.
For the "hs" or "h.s" I did this:
out = re.sub(r"(hs|h.s)(\.)?\W*a m(\W|\b)", r"\1\2 am\3", out)
It search for "hs", "h.s" before "a m". You can combine both regex, they are pretty similar or use them sequentially.
Let me know if there is any problem.
Answered By - itogaston
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.