Issue
Let's assume I have a two years dataset (24 months), and the business cycle is monthly, so I have to deliver the model scores each month (classification model). The best way to train a model (I think) for this is with this approach:
- Train months 1-12, test month 13
- Train months 1-13, test month 14
- Train months 1-14, test month 15
- ...
- Train months 1-23, test month 24
Given this, I would have 12 different results. Is there a name for this kind of training? I'm thinking in doing it by myself, but would be really helpful if actually exist a package or a name to do this kind of stuff and receive as input the ML algorithm, pipeline, or CVsearch I want to try for each training.
If exists a package or a simple way to do this, is possible also to establish a window of 12 months like this?:
- Train months 1-12, test month 13
- Train months 2-13, test month 14
- Train months 3-14, test month 15
- ...
- Train months 12-23, test month 24
And if that's possible too, is it possible to put a weight where the latest months will have a "higher weight training" in the model?
Solution
In general that would be called rolling cross validation. Scikit-learn has a function for that.
See output from their example:
>>> for train_index, test_index in tscv.split(X):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [0] TEST: [1]
TRAIN: [0 1] TEST: [2]
TRAIN: [0 1 2] TEST: [3]
TRAIN: [0 1 2 3] TEST: [4]
TRAIN: [0 1 2 3 4] TEST: [5]
Answered By - elevendollar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.