Issue
I am having a "car_sales"
pandas dataframe
which looks as below:
Make Colour Odometer (KM) Doors Price
0 Toyota White 150043 4 $4,000
1 Honda Red 87899 4 $5,000
2 Toyota Blue 32549 3 $7,000
3 BMW Black 11179 5 $22,000
4 Nissan White 213095 4 $3,500
5 Toyota Green 99213 4 $4,500
6 Honda Blue 45698 4 $7,500
7 Honda Blue 54738 4 $7,000
8 Toyota White 60000 4 $6,250
9 Nissan White 31600 4 $9,700
I want to remove $
and ,
in the Price
column.
For example, $4,000
should become 4000
.
I have written the below code:
car_sales['Price'] = car_sales['Price'].str.replace('[\$, \,]', '')
But, 'jupyter notebook' is throwing an error:
FutureWarning: The default value of regex will change from True to False in a future version.
car_sales['Price'] = car_sales['Price'].str.replace('[\$, \,]', '')
Solution
here is one way to do it, replace all non digits to null using regex
df['Price'] = df['Price'].str.replace(r'\D', "", regex=True)
Make Colour Odometer (KM) Doors Price
0 0 Toyota White 150043 4 4000
1 1 Honda Red 87899 4 5000
2 2 Toyota Blue 32549 3 7000
3 3 BMW Black 11179 5 22000
4 4 Nissan White 213095 4 3500
5 5 Toyota Green 99213 4 4500
6 6 Honda Blue 45698 4 7500
7 7 Honda Blue 54738 4 7000
8 8 Toyota White 60000 4 6250
9 9 Nissan White 31600 4 9700
for an additional scenario, that is not in the question, where we do have a decimal as part of the number, following will not loose the decimal from the resulting stripped number
df['Price'] = df['Price'].str.replace(r'[^0-9.]', "", regex=True)
Test data
Make Colour Odometer (KM) Doors Price
0 0 Toyota White 150043 4 $4,000.00
1 1 Honda Red 87899 4 $5,000.13
2 2 Toyota Blue 32549 3 $7,000.12
3 3 BMW Black 11179 5 $22,000
Make Colour Odometer (KM) Doors Price
0 0 Toyota White 150043 4 4000.00
1 1 Honda Red 87899 4 5000.13
2 2 Toyota Blue 32549 3 7000.12
3 3 BMW Black 11179 5 22000
Answered By - Naveed
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.