Issue
How to split a dataset (CSV) into training and test data in Python programming language if there are no dependent variables in it?
The project I am currently working on is machine learning based and the dataset does not contain any dependent data. The following code works only if the dataset contains a dependent data-
from sklearn.model_selection import train_test_split
xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size = 0.2, random_state = 0)
I expect the split to happen without any y
variable.
Is it possible?
Solution
There are two kinds of "random" distribution. 1) 100% random 2) 'random' but 'equal' distribution of data (i.e. same means / norms)
To answer your question, I would first recommend using a package for managing your data frames (i.e. Pandas)
see link for info: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html
So, if you wanted to get a random 50% sample of the DataFrame with replacement:
df.sample(frac=0.5, replace=True, random_state=1)
Answered By - Kathleen Allyson Harrison
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.