Issue
with pandas and jupyter notebook I would like to delete everything that is not character, that is: hyphens, special characters etc etc
es:
firstname,birthday_date
joe-down§,02-12-1990
lucash brown_ :),06-09-1980
^antony,11-02-1987
mary|,14-12-2002
change with:
firstname,birthday_date
joe down,02-12-1990
lucash brown,06-09-1980
antony,11-02-1987
mary,14-12-2002
I'm trying with:
df['firstname'] = df['firstname'].str.replace(r'!', '')
df['firstname'] = df['firstname'].str.replace(r'^', '')
df['firstname'] = df['firstname'].str.replace(r'|', '')
df['firstname'] = df['firstname'].str.replace(r'§', '')
df['firstname'] = df['firstname'].str.replace(r':', '')
df['firstname'] = df['firstname'].str.replace(r')', '')
......
......
df
it seems to work, but on more populated columns I always miss some characters. Is there a way to completely eliminate all NON-text characters and keep only a single word or words in the same column? in the example I used firstname to make the idea better! but it would also serve for columns with whole words!
Thanks!
P.S also encoded text for emoticons
Solution
You can use regex for this.
df['firstname'] = df['firstname'].str.replace('[^a-zA-Z0-9]', ' ', regex=True).str.strip()
df.firstname.tolist()
>>> ['joe down', 'lucash brown', 'antony', 'mary']
Answered By - yudhiesh
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.