[mlpack] GSOC 2014: Introduction

Tue Mar 11 12:36:09 EDT 2014

Hello Udit,

> He told me that you would be looking over the idea. I wanted you to have a look over my idea. I've talked about it with Ryan over a few mails on the group. 

As pointed out by ryan he is also interested in the project. But yes, it was my idea to include this project into the GSoC ideas list.
> implementing a batch of weak learners:
> Alternating decision trees
Maybe, one-level decision tree/ decision stump; the classical weak classifier.
> C4.5/C5: note C5 also includes boosting options
C5.0's boosting is somewhat different from AdaBoost: The boosting works by building multiple models in a sequence. The first model is created in the usual way. Then, a second model is created in such a way that it focuses on the misclassified samples by the first model and so on. Additional the boosting uses additive rather than multiplicative weight adjustments.
> simple neural networks,

Just to be sure. You are talking about multilayer perceptrons or two layer perceptrons right?
> perceptrons.  
> mlpack already implements the Naive Bayes classifier so
> that could be used also.

We should focus on a few classifier and implement the classifiers as good as possible. Just a Hint: Maybe you can answer the question: 'Why did I choose these classifiers for the AdaBoost project?' in your potential GSoC application/proposal.
> the basic adaboost algorithm is quite susceptible to noise and outliers, and a good goal would be to focus on "gentle adaboost" as a C++ interface. 
My basic idea is/was to take the classical approach aside and focus on multi-class versions of the AdaBoost Idea. There are some interesting approaches to solve the multi-class problem:

- AdaBoost MH (Schapire and Singer, 1999)
- MP-Boost (Andrea Esuli et al., 2006)
- AdaBoost-SAMME and AdaBoost-SAMME.R (Zhu et al., 2009)

All this approaches are 'more' powerful than the classical approach. We think the mentioned multi-class version could be implemented via either template parameters or arguments to the AdaBoost class. So that we should not focus on a single version.

The tricky part is the interface you should think about that. Ryan suggested some kind of an api in the following mail:

https://mailman.cc.gatech.edu/pipermail/mlpack/2014-February/000264.html

> Your suggestions ? What should be my next step ? Start writing a test interface based on a few examples ? 

Writing a test interface is probably a good Idea. Maybe you can include that in your potential GSoC application.

> But now that they're over, I'm setting up an svn repository to look at how to get involved in the process of submitting a patch (I'm more familiar with the git process so I'm getting used to it).

Just for the record: If you are more familaire with git, you can also use the 'git svn bridge'.

Feel free to ask any further questions.

Thanks,

Marcus

On 10 Mar 2014, at 22:42, Udit Saxena <saxena.udit at gmail.com> wrote:

> Hi Marcus, 
> 
> It's Udit here. I've been interested in working on the Adaboost Implementation for GSoC 2014 for a while and have been in touch with Ryan after an initial introduction over the mailing list. 
> 
> He told me that you would be looking over the idea. I wanted you to have a look over my idea. I've talked about it with Ryan over a few mails on the group. 
> 
> I imagine the list of tasks would be something similar to :
> implementing a batch of weak learners:
> Alternating decision trees
> C4.5/C5: note C5 also includes boosting options
> something simple like weighted linear least squares
> some controlled version of random forests ( unlikely, this one)
> simple neural networks,
> perceptrons.  
> mlpack already implements the Naive Bayes classifier so
> that could be used also.
> the basic adaboost algorithm is quite susceptible to noise and outliers, and a good goal would be to focus on "gentle adaboost" as a C++ interface. 
> also, the adaboost.m1, .m2, are also a good goal for implementing multiclass classification - The mlpack implementation will be flexible enough to extend through template/parameters and flexible abstractions to any of those algorithms. 
> A simple CLI would be necessary - might involve passing weak learners as supported parameters or something. 
> Your suggestions ? What should be my next step ? Start writing a test interface based on a few examples ? 
> 
> I should apologise for lying low on the mailing list for the last week as I had my midterms till Sunday. 
> 
> But now that they're over, I'm setting up an svn repository to look at how to get involved in the process of submitting a patch (I'm more familiar with the git process so I'm getting used to it).
>  
> I've also talked with Ryan on how clubbing this with the idea of developing an Ubuntu/Debian or even an up-to-date package for Arch linux (although I see that govg has taken care of the Arch side of things for now and Ryan is still working on the Debian end). 
> 
> Could you share your thoughts ?
> 
> Thanks.
> -- 
> ---------------------------------------
> Udit Saxena
> Student, BITS Pilani,
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140311/0b720e7b/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4972 bytes
Desc: not available
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140311/0b720e7b/attachment-0003.bin>