Issue
I am trying to port a sklearn feature pipeline trained in scikit-learn V0.21 to scikit-learn V0.24, because I do not have the original feature data to train the pipeline again. If I use new data, the feature dimension and position may be off from the following model, as I have DictVectorizer in the pipeline.
I've tried to use pickle and joblib to serialize the pipeline in V0.21 and then deserialize it in V0.24. Unfortunately, in both cases, the code raised ModuleNotFoundError: No module named 'sklearn.feature_extraction.dict_vectorizer'
error when loading in V0.24.
I created the pipeline with the same code using V0.21 and V0.24 respectively. When printing them out, they show some minor difference.
In V0.21
Pipeline(memory=None,
steps=[('selector', ItemSelector(key='hsd_feature_map')),
('dv1',
DictVectorizer(dtype=<class 'numpy.float64'>, separator='=',
sort=True, sparse=False)),
('tfidf',
TfidfTransformer(norm='l2', smooth_idf=True, sublinear_tf=True,
use_idf=True)),
('max', MaxAbsScaler(copy=True))],
verbose=False)
In V0.24
Pipeline(steps=[('selector', ItemSelector(key='hsd_feature_map')),
('dv1', DictVectorizer(sparse=False)),
('tfidf', TfidfTransformer(sublinear_tf=True)),
('max', MaxAbsScaler())])
I wonder if there is anyway to transfer the feature pipeline or its parameters from scikit-learn V0.21 to V0.24.
Solution
From sklearn version 0.22.X DictVectorizer
import changed
from
sklearn/feature_extraction/dict_vectorizer.py
to
sklearn/feature_extraction/_dict_vectorizer.py
I think you could override the DictVectorizer
import according to this answer
Answered By - Miguel Trejo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.