Issue
I am trying to understand the best way to scale my features and learn how to use SciKit package to transform/fit on my predicting dataset.
I have 2 groups of data.
First group has normal distribution, so I am just looking to scale the values (positive values between 20-100) using minmax scaler.
Second group of features has outliers so I believe the robustscaler will give better results.
My question is
- Can I use multiple scalers on my dataset for a classification problem using RF?
- Within SciKit, when I try to scale just 1 feature using robustscaler on my training data, I am getting this error. ValueError: Expected 2D array, got 1D array instead: I am not sure how to read this error, can I not scale just one feature?
- If I using two scalers for my data, what is the best way to implement the feature engineering if I am looking to make predictions one row at a time? Do I just use transform?
Solution
- Yes you can if you find it useful.
- You can scale single feature. If you do something like this you will have an error:
import pandas as pd
from sklearn.preprocessing import StandardScaler
df = pd.DataFrame({
"feature1": [1,2,3,4,5],
"feature2": [100, 200, 300, 400, 500],
"feature3": [200, 300, 400, 500, 600],
})
scaler = StandardScaler()
scaler.fit_transform(df["feature1"])
# output
ValueError: Expected 2D array, got 1D array instead:
You need to additionally reshape input if this is single column:
scaler = StandardScaler()
scaler.fit_transform(df["feature1"].values.reshape(-1, 1))
# output
array([[-1.41421356],
[-0.70710678],
[ 0. ],
[ 0.70710678],
[ 1.41421356]])
- You can branch preprocessing using ColumnTransformer.
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, MinMaxScaler
df = pd.DataFrame({
"feature1": [1,2,3,4,5],
"feature2": [100, 200, 300, 400, 500],
"feature3": [200, 300, 400, 500, 600],
})
transformers = ColumnTransformer(
transformers=[
("scaling1", MinMaxScaler(), ["feature1"]),
("scaling2", StandardScaler(), ["feature2", "feature3"])
]
)
transformed_df = transformers.fit_transform(df)
transformed
# output
array([[ 0. , -1.41421356, -1.41421356],
[ 0.25 , -0.70710678, -0.70710678],
[ 0.5 , 0. , 0. ],
[ 0.75 , 0.70710678, 0.70710678],
[ 1. , 1.41421356, 1.41421356]])
If you would like to for example use first scaler (scaling1) to inverse transform:
scaler_1 = transformers.named_transformers_["scaling1"]
scaler_1.inverse_transform(transformed[:, 0].reshape(-1, 1))
# output
array([[1.],
[2.],
[3.],
[4.],
[5.]])
Answered By - Pav3k
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.