Issue
I use ColumnTransformer to apply the PolynomialFeatures and the OneHotEncoder only to specific independent variable. Now I need to figure out the coefficient corresponds to each independent variable. I tried to use the get_feature_name_out but keep getting error.
Below is the code, I have x1, x2 and one hot encoder (all these are independent variables) as well as the y (target variable) known. PolynomialFeatures with degree=3 is applied to x1 only. With the output, I am able to see 14 coefficients. I found some posts for explanations with cases where degree=2 and without using ColumnTransformer which is not similar to my case. In my output below, how do I find the coefficient for x1^3, x1^2, x1, x2 and so on?
Any suggestions are greatly appreciated.
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
x1 = [28.0, 29.0, 12.0, 12.0, 42.0, 35.0, 28.0, 30.0, 32.0, 46.0, 18.0, 28.0, 28.0, 64.0, 38.0, 18.0, 49.0, 37.0, 25.0, 24.0, 42.0, 50.0, 12.0, 64.0, 23.0, 35.0, 22.0, 16.0, 44.0, 77.0, 26.0, 44.0, 38.0, 37.0, 45.0, 42.0, 24.0, 42.0, 12.0, 46.0, 12.0, 26.0, 37.0, 15.0, 67.0, 36.0, 43.0, 36.0, 45.0, 82.0, 44.0, 30.0, 33.0, 51.0, 50.0]
x2 = [0.36, 0.53, 0.45, 0.48, 0.4, 0.44, 0.44, 0.6, 0.39, 0.39, 0.29, 0.52, 0.46, 0.55, 0.62, 0.53, 0.79, 0.57, 0.49, 0.23, 0.55, 0.54, 0.44, 0.74, 0.36, 0.46, 0.37, 0.38, 0.75, 0.8, 0.43, 0.43, 0.58, 0.38, 0.63, 0.39, 0.14, 0.26, 0.14, 0.62, 0.49, 0.46, 0.49, 0.53, 0.73, 0.48, 0.5, 0.47, 0.49, 0.83, 0.56, 0.22, 0.49, 0.43, 0.46]
y = [59.5833333333333, 59.5833333333333, 10.0, 10.0, 47.0833333333333, 51.2499999999999, 34.5833333333333, 88.75, 63.7499999999999, 34.5833333333333, 51.2499999999999, 10.0, 63.7499999999999, 51.0, 59.5833333333333, 47.0833333333333, 49.5625, 43.5624999999999, 63.7499999999999, 10.0, 76.25, 47.0833333333333, 10.0, 51.2499999999999, 47.0833333333333, 10.0, 35.0, 51.2499999999999, 76.25, 100.0, 51.2499999999999, 59.5833333333333, 63.7499999999999, 76.25, 100.0, 51.2499999999999, 10.0, 22.5, 10.0, 88.75, 10.0, 59.5833333333333, 47.0833333333333, 34.5833333333333, 51.2499999999999, 63.7499999999999, 63.7499999999999, 10.0, 76.25, 62.1249999999999, 47.0833333333333, 10.0, 76.25, 47.0833333333333, 88.75]
color = ['green','red','blue','purple','black','white','orange','grey ','gold','yellow','white','orange','grey ','green','red','purple','orange','grey ','gold','yellow','white','orange','grey ','green','red','blue','black','white','orange','grey ','gold','yellow','white','orange','grey ','green','red','blue','purple','orange','grey ','gold','green','red','blue','purple','black','white','orange','grey ','gold','yellow','white','orange','grey ']
df = pd.DataFrame({
'x1': x1,
'x2' :x2,
'y': y,
'color':color})
X = df[['x1', 'x2', 'color']]
y = df['y']
preprocessor = ColumnTransformer(
transformers=[
('encoder', OneHotEncoder(sparse=False), ['color']),
('transformer', PolynomialFeatures(degree=3, include_bias=False), ['x1']),
],
remainder='passthrough')
pipeline = Pipeline([
('preprocessor', preprocessor),
('regressor', LinearRegression(fit_intercept=True))])
pipeline.fit(X, y)
print(pipeline.score(X, y))
# 0.5552322374079989
print(pipeline['regressor'].intercept_)
# -39.54122167504586
print(pipeline['regressor'].coef_)
# [ 2.60299525e-01 -2.18746546e+01 1.03128330e+01 -3.13760382e+00
# 1.45075308e+01 -1.90458338e+00 6.44800139e+00 2.91843209e+00
# 8.65334498e+00 -1.61836001e+01 3.67529674e+00 -5.30354716e-02
# 2.24998469e-04 3.99616163e+01]
Solution
By taking @Ben Reiniger's suggestion, below is the solution:
list_coeff = pipeline['regressor'].coef_ # coefficient
list_col = preprocessor.get_feature_names() # get name for each coefficent
dic = {list_col[i]: list_coeff[i] for i in range(len(list_col))} # create a dic for each coefficient and its corresponding name
print(dic)
# {'encoder__x0_black': 0.26029952521370636, 'encoder__x0_blue': -21.874654562693102, 'encoder__x0_gold': 10.312833033874762, 'encoder__x0_green': -3.1376038214365227, 'encoder__x0_grey ': 14.507530812506705, 'encoder__x0_orange': -1.904583381736955, 'encoder__x0_purple': 6.448001393570287, 'encoder__x0_red': 2.9184320909755788, 'encoder__x0_white': 8.653344984443063, 'encoder__x0_yellow': -16.183600074389265, 'transformer__x0': 3.675296737083315, 'transformer__x0^2': -0.05303547164546205, 'transformer__x0^3': 0.00022499846934471467, 'x2': 39.96161626099165}
Answered By - user032020
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.