Issue
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
df = pd.read_csv('homeprices.csv')
plt.xlabel('area')
plt.ylabel('price')
plt.scatter(df.area,df.price,color='red',marker='.')
reg = linear_model.LinearRegression()
reg.fit(df.area,df.price)
Error Message:
ValueError: Expected 2D array, got 1D array instead: array=[2600 3000 3200 3600 4000]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
It works fine if I write it as :
reg.fit(df[['area']],df.price)
I would like to know the reason behind it because The second argument is passed as df.price.
My csv file:
- area,price
- 2600,550000
- 3000,565000
- 3200,610000
- 3600,680000
- 4000,725000
Solution
From the documentation, variable x should be declared as
X{array-like, sparse matrix} of shape (n_samples, n_features)
When you declare:
x = df.area
orx = df['area']
thex
will becomeSeries
type with the size(n,)
. The size should be(n, z)
, wherez
can be any positive integer.x = df[['area']]
thex
will becomeDataFrame
type with the size(5, 1)
which makes anx
an acceptable input.y = df.price
they
will becomeSeries
type with the size(5,)
which s acceptable input.
y: array-like of shape (n_samples,)
But if I were you I declare x
and y
as:
x = [[i] for i in df['area']]
y = [i for i in df['price']]
which makes both x
and y
as the list
structure and set the size to the (5, 1)
, so in the future if you want to run in any ML library (tensorflow, pytorch, keras, ...) you won't have any difficulties.
Answered By - Ahx
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.