Issue
I have run in to a ML problem that requires us to use a multi-dimensional Y. Right now we are training independent models on each dimension of this output, which does not take advantage of additional information from the fact outputs are correlated.
I have been reading this to learn more about the few ML algorithms which have been truly extended to handle multidimensional outputs. Decision Trees are one of them.
Does scikit-learn use "Multi-target regression trees" in the event fit(X,Y) is given a multidimensional Y, or does it fit a separate tree for each dimension? I spent some time looking at the code but didn't figure it out.
Solution
After more digging, the only difference between a tree given points labeled with a single-dimensional Y versus one given points with multi-dimensional labels is in the Criterion object it uses to decide splits. A Criterion can handle multi-dimensional labels, so the result of fitting a DecisionTreeRegressor will be a single regression tree regardless of the dimension of Y.
This implies that, yes, scikit-learn does use true multi-target regression trees, which can leverage correlated outputs to positive effect.
Answered By - Pavel Komarov
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.