Issue
How can I modify the capturing group in pandas df.replace()
? I try to add thousands separators to the numbers within the string of each cell. This should happen in a method chain. Here is the code I have so far:
import pandas as pd
df = pd.DataFrame({'a_column': ['1000 text', 'text', '25000 more text', '1234567', 'more text'],
"b_column": [1, 2, 3, 4, 5]})
df = (df.reset_index()
.replace({"a_column": {"(\d+)": r"\1"}}, regex=True))
The problem is that I don't know how to do something with r"\1"
, e.g., str(float(r"\1"))
doesn't work.
Expected output:
index a_column b_column
0 0 1,000 text 1
1 1 text 2
2 2 25,000 more text 3
3 3 1,234,567 4
4 4 more text 5
Solution
You can use replace
in your pipe, looking for a point preceded by a digit and followed by some multiple of 3 digits using this regex:
(?<=\d)(?=(?:\d{3})+\b)
That can then be replaced by a comma (,
).
df = (df
.reset_index()
.replace({ 'a_column' : { r'(?<=\d)(?=(?:\d{3})+\b)' : ',' } }, regex=True)
)
Output:
index a_column b_column
0 0 1,000 text 1
1 1 text 2
2 2 25,000 more text 3
3 3 1,234,567 4
4 4 more text 5
5 5 563 and 45 and 9 text 6
Note I added an extra row to your df to show that you don't get commas where you shouldn't.
Answered By - Nick
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.