[mlpack] [GSoC] Implement Decision tree algorithms

Rajath Shashidhara rajath.shashidhara at gmail.com
Thu Mar 3 16:15:14 EST 2016


I would be pleased to hear from you regarding my approach.
Feedback will help me better understand the requirements of the
project and shape my ideas as needed.

Thank you,
Rajath Shashidhara

On Thu, Mar 3, 2016 at 4:18 PM, Rajath Shashidhara
<rajath.shashidhara at gmail.com> wrote:
> Hello,
>
> I surveyed the popular Decision Tree algorithms to identify the
> differences and commonalities between them. I believe that this is
> very important for the API design part.
>
> Important observations -
> [1] Most decision tree algorithms follow the same greedy strategy to
> grow the trees.
> [2] The input data is provided in the same format for all algorithms.
> (the algorithms don't enforce any order on the input data).
> [3] Considerable difference is observed in the greedy choice of
> attribute to split the node.
>     This could differ in the following ways.
>      (a) Impurity measure - GINI / Entropy / etc.
>      (b) Some algorithms employ a two step process (significance test
> + Impurity measure).
>      (c) Significance tests used - chi squared tests, permutation tests, etc.
> [4] Treatment of missing values differ in each algorithm.
> [5] Most algorithms use a 2-way split of each node. But, there are
> several exceptions.
> [6] Strategy to split ordered and unordered attributes may be different.
> [7] Tree pruning and stopping criterion vary across different algorithms.
> [8] Validation and Error estimation methods can also be different.
>     [a] Cross-validation tests.
>     [b] Misclassification costs.
>     [c] Prior probabilities.
>
> Although there are several differences between them they share a
> common structure. A generic API for decision trees can be designed by
> making use of function pointers (or functors in C++) for
> specialization.
>
> One level decision trees / decision stumps can be brought under the
> same API (by specifying a stopping criterion of one level).
>
> I am not familiar with Density Estimation Trees. I am going though
> some literature to understand if they can be generalized under the
> same API.
>
> Thank you,
> Rajath Shashidhara
>
> References :
> [1] http://www.stat.wisc.edu/~loh/treeprogs/guide/wires11.pdf
>
> On Tue, Mar 1, 2016 at 6:58 PM, Rajath Shashidhara
> <rajath.shashidhara at gmail.com> wrote:
>> Hello,
>>
>> I am a GSoC' 2016 aspirant. I have perused your ideas page and I am
>> interested in working on the implementation of Decision tree
>> algorithms.
>>
>> I will keep in touch with my ideas and suggestions regarding the project.
>>
>> Thank you,
>> Rajath Shashidhara



More information about the mlpack mailing list