Issue
I have two pipelines, one for my categorical features and one for my numeric features, that I feed into my column transformer. I then what to be able to fit the column transformer on my dataframe so I can see what it looks like.
My code is as follows:
num_pipeline = Pipeline(steps=[
('impute', RandomSampleImputer()),
('scale',MinMaxScaler())
])
cat_pipeline = Pipeline(steps=[
('impute', RandomSampleImputer()),
('target',TargetEncoder())
])
col_trans = ColumnTransformer(transformers=[
('num_pipeline',num_pipeline,num_cols),
('cat_pipeline',cat_pipeline,cat_cols)
],remainder=drop)
When I run
df_transform=col_trans.fit(df)
I get the error:
raise TypeError('fit_transform() missing argument: ''y''')'
Why is this?
Solution
As Guilherme Marthe and Luca Anzalone have pointed out, some transformers such as TargetEncoder
do indeed require the target variable y
to calculate the transformations.
In order to get your transformed dataset, you need to call fit_transform()
on your ColumnTransformer
col_trans
, passing both X
(your features) and y
(your target).
When you call fit_transform()
, the fit()
method will first calculate any parameters needed for the transformation (such as the mean and standard deviation for normalization), and then transform()
will apply the transformations to your data. The result is a new dataset where the transformations have been applied.
To ensure your output is a pandas
DataFrame
, you can use the set_config()
function from scikit-learn
to change the global configuration:
from sklearn import set_config
set_config(transform_output="pandas")
Now, when you transform your data, the output will be a pandas
DataFrame
:
X_transformed = col_trans.fit_transform(X, y)
Note that X_transformed
is now a DataFrame
with the column names preserved.
Please remember to update scikit-learn
to version 1.2 or later to use this feature.
Answered By - DataJanitor
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.