Issue
I want to print the names of the features from my first estimator of GradientBoostingRegressor
, but getting the below error.
Scikit_learn version = 1.2.2
model.estimators_[0]._final_estimator.feature_names_in_
output:
AttributeError Traceback (most recent call last)
Cell In[115], line 1
----> 1 model.estimators_[0]._final_estimator.feature_names_in_
AttributeError: 'GradientBoostingRegressor' object has no attribute 'feature_names_in_'
Solution
You write that you want to specifically get the feature names of the first estimator of the ensemble. Unfortunately, the feature names of the individual trees are not stored. That's why it gives you the error
AttributeError: 'GradientBoostingRegressor' object has no attribute 'feature_names_in_'
However, since they are trained on the same set of features as the entire model, the feature names from the main GradientBoostingRegressor
are available to each of its decision trees. So you can extract the feature names of the ensemble (and thus available to the first tree) like this:
model.feature_names_in_
If you are interested by the feature names used by the first tree, you can do it like this:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import fetch_california_housing
import pandas as pd
# Load the dataset
data = fetch_california_housing()
X, y = data.data, data.target
feature_names = data.feature_names
# Create and fit the GradientBoostingRegressor
model = GradientBoostingRegressor(max_features=0.5, random_state=0)
model.fit(X, y) # Directly fit on X, y without converting to DataFrame
# Access the first tree of the first estimator
first_tree = model.estimators_[0, 0]
# Get the feature indices used in the first tree and filter out non-features
used_feature_indices = set([i for i in first_tree.tree_.feature if i >= 0])
# Map indices to feature names
used_feature_names = [feature_names[i] for i in used_feature_indices]
print("All feature names:", feature_names)
print("Names of features used in the first tree:", used_feature_names)
print("Names of features not used in the first tree:", set(feature_names) - set(used_feature_names))
Answered By - DataJanitor
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.