Issue
My DF looks like below:
id date
1 ' : 07/01/2020 23:25'
2 ': 07/02/2020'
3 ' 07/03/2020 23:25 1'
4 '07/04/2020'
5 '23:50 07/05/2020'
6 '07 06 2023'
7 '00:00 07 07 2023'
I need to remove all special characters and numbers that are between ':' so DF should looks like below
id date
1 07/01/2020
2 07/02/2020
3 07/03/2020
4 07/04/2020
5 07/05/2020
I can't use a simple:
df['date'].str.split(':').str[0]
or df['date'].str.replace(":",'')
beacuse I will lose correct values. Do you have idea how I could solve this?
Regards
Solution
Assuming you want to convert to datetime, you don't necessarily have to cleanup the string, you can pass exact=False
to to_datetime
:
df['out'] = pd.to_datetime(df['date'], format='%d/%m/%Y', exact=False)
Otherwise, for strings, use the (\d{2}/\d{2}/\d{4})
regex and str.extract
df['clean'] = df['date'].str.extract(r'(\d{2}\/\d{2}\/\d{4})')
Output:
id date out clean
0 1 : 07/01/2020 23:25 2020-01-07 07/01/2020
1 2 : 07/02/2020 2020-02-07 07/02/2020
2 3 07/03/2020 23:25 1 2020-03-07 07/03/2020
3 4 07/04/2020 2020-04-07 07/04/2020
4 5 23:50 07/05/2020 2020-05-07 07/05/2020
updated example:
Extract with space or /
as separator, then replace
the spaces by /
.
df['clean'] = (df['date']
.str.extract(r'(\d{2}[ /]\d{2}[ /]\d{4})',
expand=False)
.str.replace(' ', '/')
)
Output:
id date clean
0 1 : 07/01/2020 23:25 07/01/2020
1 2 : 07/02/2020 07/02/2020
2 3 07/03/2020 23:25 1 07/03/2020
3 4 07/04/2020 07/04/2020
4 5 23:50 07/05/2020 07/05/2020
5 6 07 06 2023 07/06/2023
6 7 00:00 07 07 2023 07/07/2023
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.