Issue
At the documentation, it is said that:
percentiles: tuple of float, default=(0.05, 0.95) The lower and upper percentile used to create the extreme values for the PDP axes. Must be in [0, 1].
However, I could not fully understand the meaning of this. Does it imply that the partial dependence plots are calculated using data from the 5prc to the 95prc, and thus ignoring the contribution of the data points outside of this range?
How should I interpret this and what are the potential issues of increasing it (say 0.01, 0.99)?
Solution
It is used to create the grid for making the plot, the estimator is not changed. You can see more from the code used to calculate the values. Basically it calculates the changing values of the target while holding others constant.
Below I overlay two plots with different percentiles, you can see one is basically the extension of the other:
import matplotlib.pyplot as plt
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.inspection import PartialDependenceDisplay
X, y = make_friedman1(random_state=321)
clf = GradientBoostingRegressor(n_estimators=10).fit(X, y)
disp1 = PartialDependenceDisplay.from_estimator(estimator = clf, X = X, features = [3,2],
percentiles=(0.05,0.95))
PartialDependenceDisplay.from_estimator(estimator = clf, X = X, features = [3,2],
percentiles=(0.3,0.7),ax = disp1.axes_,pd_line_kw={'color':'k'})
Answered By - StupidWolf
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.