Issue
I'm trying to predict volatility one step ahead with an SVM model based on O'Reilly book example (Machine Learning for Financial Risk Management with Python). When I copy exactly the example (with S&P500 data) it works well but now I'm having troubles with this chunk of code with a particular fund returns data:
# returns
r = np.array([ nan, 0.0013933 , 0.00118874, 0.00076462, 0.00168565,
-0.00018507, -0.00390753, 0.00307275, -0.00351472])
# horizon
t = 252
# mean of returns
mu = r.mean()
# critical value
z = norm.ppf(0.95)
# realized volatility
vol = r.rolling(5).std()
vol = pd.DataFrame(vol)
vol.reset_index(drop=True, inplace=True)
# SVM GARCH
r_svm = r ** 2
r_svm = r_svm.reset_index()
# inputs X (returns and realized volatility)
X = pd.concat([vol, r_svm], axis=1, ignore_index=True)
X = X.dropna().copy()
X = X.reset_index()
X.drop([1, 'index'], axis=1, inplace=True)
# labels y realized volatility shifted 1 period onward
vol = vol.dropna().reset_index()
vol.drop('index', axis=1, inplace=True)
# linear kernel
svr_lin = SVR(kernel='linear')
# hyperparameters grid
para_grid = {'gamma': sp_rand(),
'C': sp_rand(),
'epsilon': sp_rand()}
# svm classifier (regression?)
clf = RandomizedSearchCV(svr_lin, para_grid)
clf.fit(X[:-1].dropna().values,
vol[1:].values.reshape(-1,))
# prediction
n_vol = clf.predict(X.iloc[-1:])
The raised error is:
ValueError: Cannot have number of splits n_splits=5 greater than the number of samples: n_samples=3.
The code works with longer returns series so I assume that the problem is the length of the array but I can't figure out how to solve it. can someone help me with that?
Solution
This error is getting raised because you use RandomizedSearchCV
with default cv
parameter.
By default RandomizedSearchCV
is running 5-folds cross-validation to find the best hyperparameters for the model.
5-folds cross-validation means splitting your training data into 5 subsets and training 5 different models based on these splits.
Looks like you have less than 5 objects in your training set, so splitting your data into 5 folds isn't possible.
To fix the issue you should either add more data or decrease number of folds for the RandomizedSearchCV
by adding cv
parameter:
clf = RandomizedSearchCV(svr_lin, para_grid, cv=2)
I'd recommend to collect more data, since 4 data points most likely won't be enough to make the model accurate or predictive.
Answered By - fshabashev
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.