Issue
Based on Numpy, create data x and label y to train the ridge regression model, and then use another created x and y to predict the regression. The percentage of correct predictions is only 14/64. I don't know where the problem is. Below is my code.
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import Ridge
one_hot = OneHotEncoder(sparse=False)
x = np.random.rand(64,40) * 2 - 1
y = np.random.randint(0,5,(64,))
y = one_hot.fit_transform(y.reshape(-1,1))
clf = Ridge(alpha=1.0)
readout = clf.fit(x,y)
a = np.random.rand(64,40) * 2 - 1
b = np.random.randint(0,5,(64,))
b = one_hot.fit_transform(b.reshape(-1,1))
y_hat = readout.predict(a)
y_hat = np.argmax(y_hat,axis=1)
target = np.argmax(b,axis=1)
correct = (y_hat == target).sum()
print(correct) # 14
Solution
For regression to work, one fundamental assumption is that, somehow, X should have power to predict y.
In your case, both X and y are randomly generated
x = np.random.rand(64,40) * 2 - 1
y = np.random.randint(0,5,(64,))
and thus X does NOT have any predictive power at all. In this scenario, any regression, or fancier machine learning models can produce nothing better than random guess and this is exactly what you get. According to y = np.random.randint(0,5,(64,))
, y is random in the interval [0, 5)
so for random guess you have a 20% chance to get the right answer and 14/64=0.21875
is just that.
Answered By - user2379740
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.