Issue
I'm working with a model, and after splitting into train and test, I want to apply StandardScaler(). However, this transformation converts my data into an array and I want to keep the format I had before. How can I do this?
Basically, I have:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
X = df[features]
y = df[["target"]]
X_train, X_test, y_train, y_test = train_test_split(
X, y, train_size=0.7, random_state=42
)
sc = StandardScaler()
X_train_sc = sc.fit_transform(X_train)
X_test_sc = sc.transform(X_test)
How can I get X_train_sc
back to the format that X_train
had?
Update: I don't want to get X_train_sc
to reverse back to before being scaled. I just want X_train_sc
to be a dataframe in the easiest possible way.
Solution
As you mentioned, applying the scaling results in a numpy array, to get a dataframe you can initialize a new one:
import pandas as pd
cols = X_train.columns
sc = StandardScaler()
X_train_sc = pd.DataFrame(sc.fit_transform(X_train), columns=cols)
X_test_sc = pd.DataFrame(sc.transform(X_test), columns=cols)
Answered By - FBruzzesi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.