[mlpack] [GSoC] Implement Decision tree algorithms

Rajath Shashidhara rajath.shashidhara at gmail.com
Thu Mar 3 05:48:00 EST 2016


Hello,

I surveyed the popular Decision Tree algorithms to identify the
differences and commonalities between them. I believe that this is
very important for the API design part.

Important observations -
[1] Most decision tree algorithms follow the same greedy strategy to
grow the trees.
[2] The input data is provided in the same format for all algorithms.
(the algorithms don't enforce any order on the input data).
[3] Considerable difference is observed in the greedy choice of
attribute to split the node.
    This could differ in the following ways.
     (a) Impurity measure - GINI / Entropy / etc.
     (b) Some algorithms employ a two step process (significance test
+ Impurity measure).
     (c) Significance tests used - chi squared tests, permutation tests, etc.
[4] Treatment of missing values differ in each algorithm.
[5] Most algorithms use a 2-way split of each node. But, there are
several exceptions.
[6] Strategy to split ordered and unordered attributes may be different.
[7] Tree pruning and stopping criterion vary across different algorithms.
[8] Validation and Error estimation methods can also be different.
    [a] Cross-validation tests.
    [b] Misclassification costs.
    [c] Prior probabilities.

Although there are several differences between them they share a
common structure. A generic API for decision trees can be designed by
making use of function pointers (or functors in C++) for
specialization.

One level decision trees / decision stumps can be brought under the
same API (by specifying a stopping criterion of one level).

I am not familiar with Density Estimation Trees. I am going though
some literature to understand if they can be generalized under the
same API.

Thank you,
Rajath Shashidhara

References :
[1] http://www.stat.wisc.edu/~loh/treeprogs/guide/wires11.pdf

On Tue, Mar 1, 2016 at 6:58 PM, Rajath Shashidhara
<rajath.shashidhara at gmail.com> wrote:
> Hello,
>
> I am a GSoC' 2016 aspirant. I have perused your ideas page and I am
> interested in working on the implementation of Decision tree
> algorithms.
>
> I will keep in touch with my ideas and suggestions regarding the project.
>
> Thank you,
> Rajath Shashidhara



More information about the mlpack mailing list