Issue
All the output transformations got combined into one column:
Shape of my data frame was (445132, 34)
it got reduced to (445132, 1)
categorical object contains all categorical columns except "General health" numerical contains all numerical columns
The following is my code:
trans = ColumnTransformer(transformers=[
("encoder", OrdinalEncoder(categories=[["Excellent","Very good","Good","Fair","Poor"]]), ["GeneralHealth"]),
("encoder1", OneHotEncoder(drop="first"), categorical),
("scaler", StandardScaler(), numerical)
], remainder="passthrough")
f_transformed = trans.fit_transform(f)
transformed_data = pd.DataFrame(f_transformed, columns=trans.get_feature_names_out())
transformed_data.head(4)
I also tried to set verbose_feature_names_out=False
in ColumnTransformer()
, but it did not change not anything.
Solution
In case you are using sklearn
version 1.2
or newer, can you please try the following? I just want to rule out the cause of the issue is transforming from numpy
array to pandas
DataFrame
.
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, StandardScaler
# Define the transformer
trans = ColumnTransformer(transformers=[
("encoder", OrdinalEncoder(categories=[["Excellent", "Very good", "Good", "Fair", "Poor"]]), ["GeneralHealth"]),
("encoder1", OneHotEncoder(drop="first"), categorical),
("scaler", StandardScaler(), numerical)
], remainder="passthrough")
# Set the output of the transformer to a pandas DataFrame
trans.set_output(transform="pandas")
# Fit and transform the data
f_transformed = trans.fit_transform(f)
# Now f_transformed should be a DataFrame with the appropriate column names
transformed_data = f_transformed
# Display the first few rows of the DataFrame
print(transformed_data.head(4))
Answered By - DataJanitor
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.