Issue
here is a sample table of the output I got while running this code
df['formatted_codes']=df['dx_code'].str.replace(r'(^\w{3}(?!$))',r'\1.',regex=True)
dx_id | dx_code | formatted_codes |
---|---|---|
1 | A00 | A00. |
2 | A000 | A00.0 |
3 | A001 | A00.1 |
4 | A009 | A00.9 |
5 | A01 | A01. |
6 | S92113 | S92.113 |
7 | S92113D | S92.113D |
but I want the '.' to apply only for characters more than 3 the output I want is like this
dx_id | dx_code | formatted_codes |
---|---|---|
1 | A00 | A00 |
2 | A000 | A00.0 |
3 | A001 | A00.1 |
4 | A009 | A00.9 |
5 | A01 | A01 |
6 | S92113 | S92.113 |
7 | S92113D | S92.113D |
so if anyone can help me with adjusting the regex code that would be helpful or if there is other way for add '.' at my desired location do tell
same question but different version
dx_id | dx_code | formatted_codes |
---|---|---|
1 | A00 | A00. |
2 | A000 | A00.0 |
3 | A00.1 | A00..1 |
4 | A00.9 | A00..9 |
5 | A01 | A01. |
6 | S92.113 | S92..113 |
7 | S92113D | S92.113D |
but I want the '.' to apply only for characters more than 3 the output I want is like this
dx_id | dx_code | formatted_codes |
---|---|---|
1 | A00 | A00 |
2 | A000 | A00.0 |
3 | A001 | A00.1 |
4 | A009 | A00.9 |
5 | A01 | A01 |
6 | S92113 | S92.113 |
7 | S92113D | S92.113D |
Solution
You need to use
df['formatted_codes']=df['dx_code'].str.replace(r'\w{3}(?!$)', r'\g<0>.', regex=True)
See the regex demo.
The \w{3}(?!$)
regex finds three consecutive word chars that are not at the start of string and replaces the found text with the same text (the \g<0>
backreference refers to the whole match value, no need for any extra capturing group around the whole pattern) and a dot char.
Answered By - Wiktor Stribiżew
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.