Issue
I have built a custom sklearn pipeline, as follows:
pipeline = make_pipeline(
SelectColumnsTransfomer(features_to_use),
ToDummiesTransformer('feature_0', prefix='feat_0', drop_first=True, dtype=bool), # Dummify customer_type
ToDummiesTransformer('feature_1', prefix='feat_1'), # Dummify the feature
ToDummiesTransformer('feature_2', prefix='feat_2'), # Dummify
ToDummiesTransformer('feature_3', prefix='feat_3'), # Dummify
)
pipeline.fit(df)
The classes SelectColumnsTransfomer
and ToDummiesTransformer
are custom sklearn steps implementing BaseEstimator
and TransformerMixin
.
To serialise this object I use
from sklearn.externals import joblib
joblib.dump(pipeline, 'data_pipeline.joblib')
but when I do deserialise with
pipeline = joblib.load('data_pipeline.joblib')
I get AttributeError: module '__main__' has no attribute 'SelectColumnsTransfomer'
.
I have read other similar questions and followed the instruction in this blogpost here, but couldn't solve the issue.
I am copying pasting the classes, and importing them in the code. If i create a simplified version of this exercise, the whole thing works, the problem occurs because i am running some tests with pytest, and when i run pytest it seems it doesn't see my custom classes, in fact there is this other part of the error
self = <sklearn.externals.joblib.numpy_pickle.NumpyUnpickler object at 0x7f821508a588>, module = '__main__', name = 'SelectColumnsTransfomer'
which is hinting me that the NumpyUnpickler
doesn't see the SelectColumnsTransfomer
even if in the test it is imported.
My test code
import pytest
from app.pipeline import * # the pipeline objects
# SelectColumnsTransfomer and ToDummiesTransformer
# are here!
@pytest.fixture(scope="module")
def clf():
pipeline = joblib.load("persistence/data_pipeline.joblib")
return clf
def test_fake(clf):
assert True
Solution
OK I found out the problem. I discovered that the problem has nothing to do with the issue explained in the blogpost herePython: pickling and dealing with "AttributeError: 'module' object has no attribute 'Thing'" as I originally thought. You can easily solve the problem by having your object pickling and unpickling the file. I was using a separate script (a Jupyther notebook) to pickle and a plain [python script to unpicle. When I did everything in the same class it worked.
Answered By - DarioB
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.