Tuesday, November 2, 2021

[FIXED] How to transform with ColumnTransformer and OrdinalEncder?

November 02, 2021 jupyter-notebook, mapping, python, scikit-learn No comments

Issue

I'm trying to preprocess data with ColumnTransformer but with no Pipeline. Here's the code:

object_pre = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("ordencoder", OrdinalEncoder(mapping=mapping)),
    ("onehot", OneHotEncoder(cols=ord_cols))
])
num_pre = SimpleImputer()

object_cols = [col for col in X.columns if X[col].dtype == "object"]
num_cols = list(set(X.columns) - set(object_cols))

preprocessor = ColumnTransformer(transformers=[
    ("num_pre", num_pre, num_cols),
    ("object_pre", object_pre, object_cols)
])
X_temp = pd.DataFrame(preprocessor.fit_transform(X))

However, when I run it, I get the following error on the last line:

KeyError: 'Education'

I am convinced that the problem has to do with the mapping variable, which is created with the following code:

mapping = [
    {
        "col": "Education",
        "mapping": {
            "Not Graduate": 0,
            "Graduate": 1
        }
    },
    {
        "col": "Dependents",
        "mapping": {
            "0": 0,
            "1": 1,
            "2": 2,
            "3+": 3,
        }        
    },
]

ord_cols = ["Gender"]
for i in list(set(X.columns) - set(ord_cols)):
    mapping.append({
        "col": i,
        "mapping": {
            "No": 0,
            "Yes": 1
        }
    })

Can you please tell me what I'm doing wrong? Thanks in advance ;)

Solution

The SimpleImputer first step of your pipeline transforms the data into a numpy array, so column names aren't available for the mapping in the OrdinalEncoder (from category_encoder package) second step. OrdinalEncoder has a parameter handle_missing with one option return_nan, so I think you can swap the order of the first two steps and have the same effect.

(The sklearn version of OrdinalEncoder passes missing values along, starting in v1.0, so you could maybe revert to that, but then you'd have the array categories instead of the dict mapping, so you'd lose feature name capabilities again.)

Answered By - Ben Reiniger

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 2, 2021

[FIXED] How to transform with ColumnTransformer and OrdinalEncder?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels