Monday, November 27, 2023

[FIXED] I performed Column transformer on my data to encode categorical features and Scale numeric features. The result combined all transformations in 1 coln

November 27, 2023 dataframe, machine-learning, python, scikit-learn No comments

Issue

All the output transformations got combined into one column: Output

Shape of my data frame was (445132, 34) it got reduced to (445132, 1)

categorical object contains all categorical columns except "General health" numerical contains all numerical columns

The following is my code:

trans = ColumnTransformer(transformers=[
    ("encoder", OrdinalEncoder(categories=[["Excellent","Very good","Good","Fair","Poor"]]), ["GeneralHealth"]),
    ("encoder1", OneHotEncoder(drop="first"), categorical),
    ("scaler", StandardScaler(), numerical)
], remainder="passthrough")

f_transformed = trans.fit_transform(f)

transformed_data = pd.DataFrame(f_transformed, columns=trans.get_feature_names_out())

transformed_data.head(4)

I also tried to set verbose_feature_names_out=False in ColumnTransformer(), but it did not change not anything.

Solution

In case you are using sklearn version 1.2 or newer, can you please try the following? I just want to rule out the cause of the issue is transforming from numpy array to pandas DataFrame.

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, StandardScaler

# Define the transformer
trans = ColumnTransformer(transformers=[
    ("encoder", OrdinalEncoder(categories=[["Excellent", "Very good", "Good", "Fair", "Poor"]]), ["GeneralHealth"]),
    ("encoder1", OneHotEncoder(drop="first"), categorical),
    ("scaler", StandardScaler(), numerical)
], remainder="passthrough")

# Set the output of the transformer to a pandas DataFrame
trans.set_output(transform="pandas")

# Fit and transform the data
f_transformed = trans.fit_transform(f)

# Now f_transformed should be a DataFrame with the appropriate column names
transformed_data = f_transformed

# Display the first few rows of the DataFrame
print(transformed_data.head(4))

Answered By - DataJanitor

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, November 27, 2023

[FIXED] I performed Column transformer on my data to encode categorical features and Scale numeric features. The result combined all transformations in 1 coln

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels