Issue
I am fairly new to pandas and come from a statistics background and I am struggling with a conceptual problem: Pandas has columns, who are containing values. But sometimes values have a special meaning - in a statistical program like SPSS or R called a "value labels".
Imagine a column rain
with two values 0
(meaning: no rain) and 1
(meaning: raining). Is there a way to assign these labels to that values?
Is there a way to do this in pandas, too? Mainly for platting and visualisation purposes.
Solution
There's not need to use a map
anymore. Since version 0.15, Pandas allows a categorical data type for its columns.
The stored data takes less space, operations on it are faster and you can use labels.
I'm taking an example from the pandas docs:
df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
#Recast grade as a categorical variable
df["grade"] = df["raw_grade"].astype("category")
df["grade"]
#Gives this:
Out[124]:
0 a
1 b
2 b
3 a
4 a
5 e
Name: grade, dtype: category
Categories (3, object): [a, b, e]
You can also rename categories and add missing categories
Answered By - cd98
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.