Wednesday, December 1, 2021

[FIXED] Changing category names in a pandas data frame

December 01, 2021 categories, pandas, python, rename No comments

Issue

I was wondering if there was any way to change the category names in a pandas dataframe, I tried to use the labels.rename_categories({'zero': '0', 'one': '1', 'two': '2', 'three': '3', 'four': '4', 'five': '5', 'six': '6', 'seven': '7', 'eight': '8', 'nine': '9'}) but that didn't work unfortunately.

here is what the pandas dataframe currently looks like

                              File  Label
20936  eight/b63fea9e_nohash_1.wav  eight
21016  eight/f44f440f_nohash_2.wav  eight
7423   three/d8ed3745_nohash_0.wav  three
1103    zero/ad63d93c_nohash_4.wav   zero
13399   five/5b09db89_nohash_0.wav   five
...                            ...    ...
13142   five/1a892463_nohash_0.wav   five
21176  eight/810c99be_nohash_0.wav  eight
16908  seven/6d818f6c_nohash_0.wav  seven
15308    six/2bfe70ef_nohash_1.wav    six
646     zero/24632875_nohash_0.wav   zero

[23666 rows x 2 columns]

Solution

TL;DR

Use Series.cat.rename_categories for categorical variables.
Use Series.map for non-categorical variables.
Use Series.replace if regex is needed.

1. `Series.cat.rename_categories`

This option is fastest but requires the Categorical dtype. If you're analyzing categorical variables, this is highly recommended for its speed/memory/semantic benefits.

First convert to Categorical (if not already):

df['Label'] = df['Label'].astype('category')

Then rename via Series.cat.rename_categories:

df['Label'] = df['Label'].cat.rename_categories({'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9})

#                               File Label
# 20936  eight/b63fea9e_nohash_1.wav     8
# 21016  eight/f44f440f_nohash_2.wav     8
# 7423   three/d8ed3745_nohash_0.wav     3
# ...                            ...   ...
# 646     zero/24632875_nohash_0.wav     0

2. `Series.map`

If you can't (or don't want to) use the Categorical dtype, Series.map is the next fastest:

df['Label'] = df['Label'].map({'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9})

#                               File Label
# 20936  eight/b63fea9e_nohash_1.wav     8
# 21016  eight/f44f440f_nohash_2.wav     8
# 7423   three/d8ed3745_nohash_0.wav     3
# ...                            ...   ...
# 646     zero/24632875_nohash_0.wav     0

3. `Series.replace`

This option is slow but offers regex/filling capabilities via the regex and method params.

As a contrived example, say we want less granular labels:

mapping = {
    r'zero|one': '0,1',
    r'two|three': '2,3',
    r'four|five': '4,5',
    r'six|seven': '6,7',
    r'eight|nine': '8,9',
}

Then we can use Series.replace with regex=True:

df['Label'] = df['Label'].replace(mapping, regex=True)

#                               File Label
# 20936  eight/b63fea9e_nohash_1.wav   8,9
# 7423   three/d8ed3745_nohash_0.wav   2,3
# 1103    zero/ad63d93c_nohash_4.wav   0,1
# ...                            ...   ...
# 646     zero/24632875_nohash_0.wav   0,1

Answered By - tdy

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, December 1, 2021

[FIXED] Changing category names in a pandas data frame

Issue

Solution

TL;DR

1. `Series.cat.rename_categories`

2. `Series.map`

3. `Series.replace`

0 comments:

Post a Comment

Popular Posts

Labels

Wednesday, December 1, 2021

Issue

Solution

TL;DR

1. Series.cat.rename_categories

2. Series.map

3. Series.replace

0 comments:

Post a Comment

Popular Posts

Labels

1. `Series.cat.rename_categories`

2. `Series.map`

3. `Series.replace`