Issue
I am new to both scikit and numpy/pandas, but I am familiar with Python and data processing in general. I am confused about what format the inputs to sk-learn classifiers should be. I have tried using a debugger to inspect example matrices used in tutorial examples of sk-learn, but they have a huge number of members and I can't figure out which ones are the data and which are derived.
Is there a reference specification somewhere that explains what an array must look like and how to construct it for it to be a valid input for sk-learn classifiers?
Solution
Sklearn expects your feature matrix X
to have the following form:
ind feat1 feat2
0 2 1
1 1 2
You can use either pandas Dataframes or numpy arrays for inputs.
If you are using classified learning then y
needs to have as many rows as X
.
You can load datasets from sklearn, and check the dimensions and shapes of the matrices because already fit right into problem-related algorithms (in this case it would be a supervised regression problem):
import sklearn.datasets
X,y = sklearn.datasets.load_boston(return_X_y=True)
X.shape[0] == y.shape[0]
Output
True
Answered By - pythonic833
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.