Monday, July 4, 2022

[FIXED] Replacing each occurrence of pattern in a dataframe

July 04, 2022 dataframe, jupyter-notebook, pandas, pandas-groupby, python No comments

Issue

I am having a "car_sales" pandas dataframe which looks as below:

     Make Colour  Odometer (KM)  Doors     Price
0  Toyota  White         150043      4   $4,000 
1   Honda    Red          87899      4   $5,000 
2  Toyota   Blue          32549      3   $7,000 
3     BMW  Black          11179      5  $22,000 
4  Nissan  White         213095      4   $3,500 
5  Toyota  Green          99213      4   $4,500 
6   Honda   Blue          45698      4   $7,500 
7   Honda   Blue          54738      4   $7,000 
8  Toyota  White          60000      4   $6,250 
9  Nissan  White          31600      4   $9,700

I want to remove $ and , in the Price column.

For example, $4,000 should become 4000.

I have written the below code:

car_sales['Price'] = car_sales['Price'].str.replace('[\$, \,]', '')

But, 'jupyter notebook' is throwing an error:

FutureWarning: The default value of regex will change from True to False in a future version.
  car_sales['Price'] = car_sales['Price'].str.replace('[\$, \,]', '')

Solution

here is one way to do it, replace all non digits to null using regex

df['Price'] = df['Price'].str.replace(r'\D', "", regex=True)

    Make    Colour  Odometer    (KM)    Doors   Price
0   0       Toyota  White     150043      4      4000
1   1       Honda   Red        87899      4      5000
2   2       Toyota  Blue       32549      3      7000
3   3       BMW     Black      11179      5     22000
4   4       Nissan  White      213095     4      3500
5   5       Toyota  Green      99213      4      4500
6   6       Honda   Blue       45698      4      7500
7   7       Honda   Blue       54738      4      7000
8   8       Toyota  White      60000      4      6250
9   9       Nissan  White      31600       4     9700

for an additional scenario, that is not in the question, where we do have a decimal as part of the number, following will not loose the decimal from the resulting stripped number

df['Price'] = df['Price'].str.replace(r'[^0-9.]', "", regex=True)

Test data

    Make    Colour  Odometer    (KM)    Doors   Price
0     0     Toyota  White      150043     4     $4,000.00
1     1     Honda   Red         87899     4     $5,000.13
2     2     Toyota  Blue        32549     3     $7,000.12
3     3     BMW     Black       11179     5     $22,000

    Make    Colour  Odometer    (KM)    Doors   Price
0     0     Toyota  White     150043      4     4000.00
1     1     Honda   Red        87899      4     5000.13
2     2     Toyota  Blue       32549      3     7000.12
3     3     BMW     Black      11179      5     22000

Answered By - Naveed

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, July 4, 2022

[FIXED] Replacing each occurrence of pattern in a dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels