Sunday, October 24, 2021

[FIXED] How to use a linear regression model to produce a single prediction value?

October 24, 2021 linear-regression, machine-learning, python, scikit-learn No comments

Issue

I have created three machine learning models using Scikit-learn in Jupyter Notebook (Linear regression, Dtree and Random forest). The purpose of the models are to predict the size of a cyclone (prediction/output ROCI) based on several cyclone parameters (predictors/inputs). There are 9004 rows. Below is an example of the linear regression model.

In[31]: df.head()
Out[31]:    NAME    LAT    LON   Pc    Penv   ROCI  Vmax  Pdc
         0  HECTOR  -15    128   985   1000   541   18    -15
         1  HECTOR  -15    127   990   1000   541   15.4  -10         
         2  HECTOR  -16    126   992   1000   530   15    -8
         3  HECTOR  -16.3  126   992   1000   480   15.4  -8
         4  HECTOR  -16.5  126   992   1000   541   15.4  -8

In [32]: X=df[['LAT','LON','Pc','Vmax','Pdc=Pc-Penv']]
         y=df['ROCI']

In [33]: X_train, X_test, y_train, y_test = train_test_split(X, y, 
         test_size=0.4) 

In [34]: lm=LinearRegression()

In [35]: lm.fit(X_train,y_train)
Out [35]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, 
          normalize=False)

In [36]: print(lm.intercept_)
         lm.coef_
         -3464.3452921023572
Out [36]: array([-2.94229126,  0.29875575,  3.65214265, -1.25577799, 
          -6.43917746])

In [37]: predictions=lm.predict(X_test)
         predictions
Out [37]:array([401.02108725, 420.01451472, 434.4241271 , ..., 
         287.67803538, 343.80516896, 340.1007666 ])

In [38]: plt.scatter(y_test,predictions)
         plt.xlabel('Recorded')
         plt.ylabel('Predicted')
      
         *figure to display accuracy*

Now when I try to input a single value in the lm.predict() I get the following error:

ValueError: Expected 2D array, got scalar array instead:
array=300.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I assume this is due to the fact my model is trained using 5 columns, so trying to input the first row of my dataset:

In [39]: lm.predict(-15,128,985,18,-15)
         ...
         ...
         TypeError: predict() takes 2 positional arguments but 6 were 
         given

Trying the array.reshape as suggested I get:

In [49]: lm.predict(X_test.reshape(-1, 1))
         ...
         ...
         AttributeError: 'DataFrame' object has no attribute 'reshape'

And now I am confused! Please could you assist me in using my model to give me a prediction value. What should I input in lm.predict()? I basically just want to be able to say "Pc=990, Vmax=18, Pdc=-12" and I get something like "ROCI=540". Thank you for your time.

Solution

If you want to predict the first row of your data, you should make it first as an array:

import numpy as np

first_row = np.array([-15, 128, 985, 18, -15])

Then, when

lm.predict(first_row)

produces an error similar to the one you report,

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

follow the advice in the message, i.e.:

lm.predict(first_row.reshape(1, -1))

Answered By - desertnaut

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, October 24, 2021

[FIXED] How to use a linear regression model to produce a single prediction value?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels