Issue
I have a sklearn pipeline that has been defined in the following way:
from tools.transformers import MyTransformer
...
pipe = Pipeline([
('mytransformer', MyTransformer()),
('lm', LinearRegression())
])
...
The structure of my code is
src
├── __init__.py
├── train.py
└── tools
└── transformers.py
I have trained my model and my pipeline is saved in a .joblib
file. Now I want to use my model in another project. However, I need to move not only the .joblib
file, but the whole tools/transformers.py
structure. I think this is kind of difficult to maintain and hard to understand.
Is there an easier way to make the pipeline work without the need of moving the code around with the exact same structure?
Solution
You need to create a separate project, for instance, internal_lib
, and move there all custom logic that you use in the different projects. Then, you need to install your internal_lib
as a part of your python environment (via pip or conda). After, you will be able to pickle a trained pipeline and reuse it in another project.
Technically it can be implemented as a private github repo and installed via pip. Here are couple of the links on how to implement: one, two.
Answered By - Danylo Baibak
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.