Issue
For a machine learning regression model, i use Scikit learn model_selection.train_test_split()
function.
I have to keep somewhere the origin index into train data. I lost this information after splitting data into train/test. And i can't match data original and train data with their index. How can i fix please?
Solution
If your data is a pandas dataframe, it is no problem to get the original indices as they are preserved in the splits:
from sklearn import datasets
from sklearn.model_selection import train_test_split
# import some data to demonstrate
iris = datasets.load_iris(as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)
print(X_train.index.values)
If your data is a numpy array, you can also just wrap it into a pandas dataframe beforehand, get the indices and proceed as you like, e.g. with the dataframe or numpy arrays.
Answered By - afsharov
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.