Issue
There are certainly many ways of creating interaction terms in Python, whether by using numpy
or pandas
directly, or some library like patsy
. However, I was looking for a way of creating interaction terms scikit-learn style, i.e. in a form that plays nicely with its fit-transform-predict paradigm. How might I do this?
Solution
Let's consider the case of making an interaction term between two variables.
You might make use of the FunctionTransformer
class, like so:
import numpy as np
from sklearn.preprocessing import FunctionTransformer
# 5 rows, 2 columns
X = np.arange(10).reshape(5, 2)
# Appends interaction of columns at 0 and 1 indices to original matrix
interaction_append_function = lambda x: np.append(x, (x[:, 0] * x[:, 1])[:, None], 1)
interaction_transformer = FunctionTransformer(func=interaction_append_function)
Let's try it out:
>>> interaction_transformer.fit_transform(X)
array([[ 0, 1, 0],
[ 2, 3, 6],
[ 4, 5, 20],
[ 6, 7, 42],
[ 8, 9, 72]])
You now have a transformer that will play well with other workflows like sklearn.pipeline
or sklearn.compose
.
Certainly there are more extensible ways of handling this, but hopefully you get the idea.
Answered By - blacksite
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.