Issue
I am working on a project where I am using a machine learning algorithm (namely the scikit-learn RandomForestClassifier class) to classify some data. I have already used RandomForestClassifier.fit()
to fit the training data and now I am trying to use it to predict my data.
My problem is I'm not sure what type of data I need to be passing into my RandomForestClassifier.predict()
method to make a prediction. I have already used the predict()
method to predict using a test set, but I am struggling to see how to apply my trained algorithm for more general use problems.
Namely my main issue is I am trying to make a prediction for a single row of my dataframe. locate a single row in the dataframe I used to train the algorithm and make a single prediction for it. This is one of many variations of the code I've used:
Xnew = productMarketResearch.loc[50]
Xnew = np.array(Xnew.values.tolist())
Xnew = sc.transform(Xnew)
ynew = rfc.predict(Xnew)
Everything I try throws the same error:
ValueError: Expected 2D array, got 1D array instead:
array=[3.63360000e+04 1.55639455e+12 0.00000000e+00].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I have tried using array.reshape among other methods for converting this to a 2D array but nothing has worked. Any solutions for this problem as well as some general advice for using the predict()
method alongside dataframes?
Solution
Most of sklearn models require 2D array. Try:
# difference here
Xnew = productMarketResearch.loc[[50]]
# usually you don't need this
# Xnew = np.array(Xnew.values.tolist())
Xnew = sc.transform(Xnew)
ynew = rfc.predict(Xnew)
Answered By - Quang Hoang
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.