Issue
In my data set there are few data(i.e. 1.4619664882428694e+258
) which are greater than float32
max value(3.4028235e+38
). Now during fitting the model I am getting the below error:
Input contains NaN, infinity or a value too large for dtype('float32').
I tried below code:
df_features = pd.read_csv('data\df_features.csv')
df_target = pd.read_csv('data\df_target.csv')
X_train, X_test, y_train, y_test = train_test_split(df_features, df_target, test_size=.25, random_state=0)
model = AdaBoostRegressor()
try:
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
acc = r2_score(y_test, y_pred)
print(acc)
except Exception as error:
print(error)
How can I solve this problem if I want to use the real data without normalizing? Is there any option so that i can set default data type to float64 for sklearn. If so then how?
Solution
It's a numerical precision problem. There is no current solution. The numbers are huge
I can replicate using this:
import numpy as np
from sklearn.ensemble import AdaBoostRegressor
X = np.repeat([1.4619664882428694e+258],100)
X = X.reshape(10,10)
y = np.ones((10,1))
model = AdaBoostRegressor()
model.fit(X,y)
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
np.all(np.isfinite(X))
True
I opened a request here: https://github.com/scikit-learn/scikit-learn/issues/15628
Answered By - seralouk
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.