Issue
I have implemented 3 TransformerMixin
classes in an attempt to make my own scikitlearn Pipeline
. However, I am unable to combine them since PrepareModel
object uses information from FeatureEngineering
object. In particular, consider:
cleaner = DataCleaner()
df_clean = cleaner.fit_transform(df)
engineering = FeatureEngineering()
df_engineered = engineering.fit_transform(df_clean)
modelprep = PrepareModel(engineering.des_features)
X = modelprep.fit_transform(df_engineered)
Note that each of DataCleaner
, FeatureEngineering
, PrepareModel
are child classes of TransformerMixin
.
How would I make a Pipeline
with this setup?
from sklearn.pipeline import Pipeline
full_pipeline = Pipeline([('cleaner', DataCleaner()),
('engineering', FeatureEngineering()),
('prepare', PrepareModel())])
The issue I have is that the third step needs the des_features
from the second step? So this does not work. How would I make this work?
Solution
This isn't currently easy to do; it's probably another use-case for the "metadata routing" SLEP006.
In this example, since you own all the transformers, you might be able to hack something together by just attaching an attribute to the output dataset:
class FeatureEngineering(...):
...
def transform(self, X):
...
return_value.metadata = self.des_features
return return_value
class PrepareModel(...):
...
def fit(self, X, y=None):
self.des_features = X.metadata
...
Answered By - Ben Reiniger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.