Issue
I have a data frame that looks like this:
df = pd.DataFrame(
{
'x' : range(0,5),
'y' : [1,2,3,np.nan, np.nan]
})
I want to impute the values for y and also apply standardization to the two variables with the following code:
columnPreprocess = ColumnTransformer([
('imputer', SimpleImputer(strategy = 'median'), ['x','y']),
('scaler', StandardScaler(), ['x','y'])])
columnPreprocess.fit_transform(df)
However, it seems like the ColumnTransformer
would setup separate columns for each steps, with different transformations in different columns. This is not what I intended.
Is there a way to apply different transformation to the same columns and result in the same number of columns in the outputting array?
Solution
You should use Pipeline
in this case:
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
df = pd.DataFrame({
'x': range(0, 5),
'y': [1, 2, 3, np.nan, np.nan]
})
pipeline = Pipeline([
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
])
pipeline.fit_transform(df)
# array([[-1.41421356, -1.58113883],
# [-0.70710678, 0. ],
# [ 0. , 1.58113883],
# [ 0.70710678, 0. ],
# [ 1.41421356, 0. ]])
Answered By - Flavia Giammarino
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.