Issue
I'm trying to create a series of dummy variables from a categorical variable using pandas in python. I've come across the get_dummies
function, but whenever I try to call it I receive an error that the name is not defined.
Any thoughts or other ways to create the dummy variables would be appreciated.
EDIT: Since others seem to be coming across this, the get_dummies
function in pandas now works perfectly fine. This means the following should work:
import pandas as pd
dummies = pd.get_dummies(df['Category'])
See http://blog.yhathq.com/posts/logistic-regression-and-python.html for further information.
Solution
It's hard to infer what you're looking for from the question, but my best guess is as follows.
If we assume you have a DataFrame where some column is 'Category' and contains integers (or otherwise unique identifiers) for categories, then we can do the following.
Call the DataFrame dfrm
, and assume that for each row, dfrm['Category']
is some value in the set of integers from 1 to N. Then,
for elem in dfrm['Category'].unique():
dfrm[str(elem)] = dfrm['Category'] == elem
Now there will be a new indicator column for each category that is True/False depending on whether the data in that row are in that category.
If you want to control the category names, you could make a dictionary, such as
cat_names = {1:'Some_Treatment', 2:'Full_Treatment', 3:'Control'}
for elem in dfrm['Category'].unique():
dfrm[cat_names[elem]] = dfrm['Category'] == elem
to result in having columns with specified names, rather than just string conversion of the category values. In fact, for some types, str()
may not produce anything useful for you.
Answered By - ely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.