Issue
I have a pandas dataframe which contain a set of strings. Is there a way to match date patterns in a list with the string in datafra,e and create a new column containing dates in the same dataframe.
Code
import pandas as pd
format_list = ["[0-9]{1,2}(?:\,|\.|\/|\-)(?:\s)?[0-9]{1,2}(?:\,|\.|\/|\-)(?:\s)?[0-9]{2,4}",
"[0-9]{1,2}(?:\.)(?:\s)?(?:(?:(?:j|J)a)|(?:(?:f|F)e)|(?:(?:m|M)a)|(?:(?:a|A)p)|(?:(?:m|M)a)|(?:(?:j|J)u)|(?:(?:a|A)u)|(?:(?:s|S)e)|(?:(?:o|O)c)|(?:(?:n|N)o)|(?:(?:d|D)e))\w*(?:\s)?[0-9]{2,4}",
"(?:(?:(?:j|J)an)|(?:(?:f|F)eb)|(?:(?:m|M)ar)|(?:(?:a|A)pr)|(?:(?:m|M)ay)|(?:(?:j|J)un)|(?:(?:j|J)ul)|(?:(?:a|A)ug)|(?:(?:s|S)ep)|(?:(?:o|O)ct)|(?:(?:n|N)ov)|(?:(?:d|D)ec))\w*(?:\s)?(?:\n)?[0-9]{1,2}(?:\s)?(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}",
"[0-9]{1,2}(?:\.)?(?:\s)?(?:\n)?(?:(?:(?:j|J)a)|(?:(?:f|F)e)|(?:(?:m|M)a)|(?:(?:a|A)p)|(?:(?:m|M)a)|(?:(?:j|J)u)|(?:(?:a|A)u)|(?:(?:s|S)e)|(?:(?:o|O)c)|(?:(?:n|N)o)|(?:(?:d|D)e))\w*(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}"]
# initialise data of lists.
data = {'Name':['Today is 09 September 2021', '25 December 2021 is christmas', '01/01/2022 is newyear and will be holiday on 02.01.2022 also']}
# Create DataFrame
df = pd.DataFrame(data)
Desired Output
Name Date
0 Today is 09 September 2021 09 September 2021
1 25 December 2021 is christmas 25 December 2021
2 01/01/2022 is newyear and will be holiday on 02.01.2022 also 01/01/2022, 02.01.2022
Solution
Use Series.str.findall
with joined values by |
for regex or
and then join lists by Series.str.join
:
df['Date'] = df['Name'].str.findall("|".join(format_list)).str.join(', ')
print (df)
Name Date
0 Today is 09 September 2021 09 September 2021
1 25 December 2021 is christmas 25 December 2021
2 01/01/2022 is newyear and will be holiday on 0... 01/01/2022, 02.01.2022
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.