Issue
I was wondering if there was any way to change the category names in a pandas dataframe, I tried to use the labels.rename_categories({'zero': '0', 'one': '1', 'two': '2', 'three': '3', 'four': '4', 'five': '5', 'six': '6', 'seven': '7', 'eight': '8', 'nine': '9'})
but that didn't work unfortunately.
here is what the pandas dataframe currently looks like
File Label
20936 eight/b63fea9e_nohash_1.wav eight
21016 eight/f44f440f_nohash_2.wav eight
7423 three/d8ed3745_nohash_0.wav three
1103 zero/ad63d93c_nohash_4.wav zero
13399 five/5b09db89_nohash_0.wav five
... ... ...
13142 five/1a892463_nohash_0.wav five
21176 eight/810c99be_nohash_0.wav eight
16908 seven/6d818f6c_nohash_0.wav seven
15308 six/2bfe70ef_nohash_1.wav six
646 zero/24632875_nohash_0.wav zero
[23666 rows x 2 columns]
Solution
TL;DR
Use
Series.cat.rename_categories
for categorical variables.Use
Series.map
for non-categorical variables.Use
Series.replace
if regex is needed.
1. Series.cat.rename_categories
This option is fastest but requires the Categorical
dtype. If you're analyzing categorical variables, this is highly recommended for its speed/memory/semantic benefits.
First convert to Categorical
(if not already):
df['Label'] = df['Label'].astype('category')
Then rename via Series.cat.rename_categories
:
df['Label'] = df['Label'].cat.rename_categories({'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9})
# File Label
# 20936 eight/b63fea9e_nohash_1.wav 8
# 21016 eight/f44f440f_nohash_2.wav 8
# 7423 three/d8ed3745_nohash_0.wav 3
# ... ... ...
# 646 zero/24632875_nohash_0.wav 0
2. Series.map
If you can't (or don't want to) use the Categorical
dtype, Series.map
is the next fastest:
df['Label'] = df['Label'].map({'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9})
# File Label
# 20936 eight/b63fea9e_nohash_1.wav 8
# 21016 eight/f44f440f_nohash_2.wav 8
# 7423 three/d8ed3745_nohash_0.wav 3
# ... ... ...
# 646 zero/24632875_nohash_0.wav 0
3. Series.replace
This option is slow but offers regex/filling capabilities via the regex
and method
params.
As a contrived example, say we want less granular labels:
mapping = {
r'zero|one': '0,1',
r'two|three': '2,3',
r'four|five': '4,5',
r'six|seven': '6,7',
r'eight|nine': '8,9',
}
Then we can use Series.replace
with regex=True
:
df['Label'] = df['Label'].replace(mapping, regex=True)
# File Label
# 20936 eight/b63fea9e_nohash_1.wav 8,9
# 7423 three/d8ed3745_nohash_0.wav 2,3
# 1103 zero/ad63d93c_nohash_4.wav 0,1
# ... ... ...
# 646 zero/24632875_nohash_0.wav 0,1
Answered By - tdy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.