[mlpack] mlpack Digest, Vol 13, Issue 33

Wed Mar 19 13:12:47 EDT 2014

Hello,
I have thought of what you said regarding the GA approach. It will trully take a lot of time to train if you have large amounts of data, I don't think it would be practical....
I have been subscribed at this mailing list and I have another idea which I haven't seen proposed for the ada boost method.
I will be gratefull if you will give me your opinion on this and tell me if it would be a good contribution.
I thoght of implementing a boosting algorithm. The boosting algorithm will have 3 classifiers.
All the classifiers will be ada boost classifiers.
How it will work
The first classifier will be trained on the data, The second ada boost will be trained on the data the first one miss classifies and the third ada boost will be trained on the data where the first 2 are conflicting.
The ada boost will be able to use 2 types of weak learners:
-> first weak learner will be a decision stump(threshold of some sort).
-> second weak learner will be a single layer perceptron that would be trained with delta rule

Please share youre thoughts with me regarding this idea and also tell me if this would be a contribution. From what I have read in the previous posts no one has stated an idea such as this.
Thanks for youre time :)

Best regards,
Muresan Mircea Paul

On Wednesday, March 19, 2014 6:01 PM, "mlpack-request at cc.gatech.edu" <mlpack-request at cc.gatech.edu> wrote:

Send mlpack mailing list submissions to
    mlpack at cc.gatech.edu

To subscribe or unsubscribe via the World Wide Web, visit
    https://mailman.cc.gatech.edu/mailman/listinfo/mlpack
or, via email, send a message with subject or body 'help' to
    mlpack-request at cc.gatech.edu

You can reach the person managing the list at
    mlpack-owner at cc.gatech.edu

When replying, please edit your Subject line so it is more specific
than "Re: Contents of mlpack digest..."

Today's Topics:

   1. Re: Apply for the implementation of the QUIC-SVD
      collaborative filtering (Ryan Curtin)
   2. Re: EXC_BAD_ACCESS in boost (Ryan Curtin)
   3. Re: GSOC Cool Idea (Ryan Curtin)
   4. GSoC 2104 - Introduction: Fast k-centers algorithm
      (Prasad Samarakoon)

----------------------------------------------------------------------

Message: 1
Date: Tue, 18 Mar 2014 15:37:31 -0400
From: Ryan Curtin <gth671b at mail.gatech.edu>
To: Wilson Cao <wilsoncao01 at gmail.com>
Cc: mlpack at cc.gatech.edu
Subject: Re: [mlpack] Apply for the implementation of the QUIC-SVD
    collaborative filtering
Message-ID: <20140318193731.GV18362 at spoon.lugatgt.org>
Content-Type: text/plain; charset=us-ascii

On Mon, Mar 17, 2014 at 12:55:50PM +0800, Wilson Cao wrote:
> Hello,
> 
> My name is Wilson Cao, a Chinese students from South China University of
> Technology. I am really interested in the implementation of the QUIC-SVD
> collaborative filtering.

Hi Wilson,

I'm sorry for the slow response.

> The most important part of this SVD-based collaborative filtering is the to
> implement the svd method to mlpack API. The QUIC-SVD method use the new
> data structure -- cosine tree. It is more efficient than the previous Monte
> Carlo linear algebra methods.

Efficient in what way?

> What API can we use to implement the QUIC-SVD algorithm? I think maybe we
> should create the abstract class or the template class, and this class
> constructor should take the user-item matrix as an input. Also, the
> collaborative filtering algorithm should be include in the in this class.
> 
> Sometimes, the rates of the item from the users are not always be the
> number, so I think we need to implement a kind of API so that the
> programmer can define the type of "rate".

No.  We have arma::mat (and other numerical matrix data types) as data
types, and if we wanted to start supporting other types of features, it
takes a lot of overhead and will be slow.  If anything, a transition
layer to convert non-numeric categorical features into numerical
features is the way to go.

> I really believe that the performance is the key to this algorithm, so I am
> wondering if we can use the cluster distributed system to implement is
> algorithm? I haven't find out whether this is feasible.

If you have any ideas that can balance API cleanliness and simplicity
with scalability, I'm all ears.  Making trees work in a distributed
setting is not an easy task, in general.

> I am really interested in the project! However, I have been in the trouble
> that I have my TOEFL exam in Mar 23 (UTC + 8:00), which means that I can't
> get myself full prepared for the proposal. I have to apology for my lack
> preparation for this project. I am wondering whether I can send the draft
> proposal first? I promise I will get full prepared for the project and show
> my deep passion on it right after my TOEFL exam.

I'm sorry, but we can't accept late proposals.  If you upload your
proposal to Melange (which I think you already have) I will look at it
and comment.

Thanks,

Ryan

-- 
Ryan Curtin    | "Sometimes, I doubt your commitment to Sparkle
ryan at ratml.org | Motion!"  - Kitty Farmer

------------------------------

Message: 2
Date: Tue, 18 Mar 2014 15:40:26 -0400
From: Ryan Curtin <gth671b at mail.gatech.edu>
To: Li Dong <dongli at lasg.iap.ac.cn>
Cc: mlpack at cc.gatech.edu
Subject: Re: [mlpack] EXC_BAD_ACCESS in boost
Message-ID: <20140318194026.GX18362 at spoon.lugatgt.org>
Content-Type: text/plain; charset=us-ascii

On Mon, Mar 17, 2014 at 04:49:00PM +0800, Li Dong wrote:
> Hi all,
> 
> I am using RangeSearch of MLPACK in my program, and it has no problem
> several days ago. But I made some changes to my codes these days, and
> encountered the following error in Debug mode:
> 
> iMac:Debug dongli$ lldb ./demo_testcase
> Current executable set to './demo_testcase' (x86_64).
> (lldb) r
> Process 84476 launched: './demo_testcase' (x86_64)
> Process 84476 stopped
> * thread #1: tid = 0xbf77f, 0x0000000100216e7b libmlpack.1.0.dylib`long double boost::math::lanczos::lanczos17m64::lanczos_sum<long double>(long double const&) + 59, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10008a500)
>     frame #0: 0x0000000100216e7b libmlpack.1.0.dylib`long double boost::math::lanczos::lanczos17m64::lanczos_sum<long double>(long double const&) + 59
> libmlpack.1.0.dylib`long double boost::math::lanczos::lanczos17m64::lanczos_sum<long double>(long double const&) + 59:
> -> 0x100216e7b:  fstpt  (%rax)
>    0x100216e7d:  fldt   0x58f4d(%rip)             ; typeinfo name for boost::exception_detail::error_info_injector<std::domain_error> + 128
>    0x100216e83:  fstpt  0x10(%rax)
>    0x100216e86:  fldt   0x58f54(%rip)             ; typeinfo name for boost::exception_detail::error_info_injector<std::domain_error> + 144
> 
> In Release mode, there is no such error! My platform is Mac OS X
> 10.9.2, compiler is clang 5.1.0 in Xcode 5.1, boost version is 1.55.0.
> 
> Any idea? Thanks in advance!

What were the changes that you made to your program?

Two ideas I have might be to try gcc instead of clang, or to try a
different Boost version.  I don't think either of those is very likely
to work, though.  If you can paste some code that causes the error, I
can try to reproduce it; then, we can work towards a solution.

Thanks,

Ryan

-- 
Ryan Curtin    | "I am."
ryan at ratml.org |   - Joe

------------------------------

Message: 3
Date: Tue, 18 Mar 2014 16:40:27 -0400
From: Ryan Curtin <gth671b at mail.gatech.edu>
To: muresan mircea <mmp_mircea at yahoo.com>
Cc: "mlpack at cc.gatech.edu" <mlpack at cc.gatech.edu>
Subject: Re: [mlpack] GSOC Cool Idea
Message-ID: <20140318204027.GD18362 at spoon.lugatgt.org>
Content-Type: text/plain; charset=iso-8859-1

On Mon, Mar 17, 2014 at 04:55:26PM -0700, muresan mircea wrote:
> Hello,
> I would like to participate at GSOC and I have a nice ml idea.
> I would like to implement a genetic algorithm that would generate a classifier ->? WHICH MEANS? : instaed of training a classifier i would like to generate one throught the process of selection crossover and mutation. The classifier is going to be a neural network and instead of training this neural network with back prop I could generate its synaptic weights through GA. 
> 
> Also since ada boost is a fairly simple algorithm i would like to implement that as well.
> I have implemented in the past ada boost and also I have trained ANN with GA, and i would like to contribute to mlpack with this algortihms.
> I hope I didn't contact you too late.
> Thank you for your time !

Hi Muresan,

Before we would be interested in this contribution, you would have to
show that your algorithm generated a classifier that performed
reasonably (with respect to classification accuracy) and did not take
incredibly long to run.

Judging by how your idea is written, it will take an extremely long time
-- especially with large datasets -- to train a classifier using genetic
algorithms.

If you are interested in AdaBoost, please take a look at the mailing
list archives to see the discussions on that project:

https://mailman.cc.gatech.edu/pipermail/mlpack/

Thanks,

Ryan

-- 
Ryan Curtin    | "Moo."
ryan at ratml.org |   - Eugene Belford

------------------------------

Message: 4
Date: Wed, 19 Mar 2014 00:07:07 +0100
From: Prasad Samarakoon <prasnuts at gmail.com>
To: mlpack at cc.gatech.edu
Subject: [mlpack] GSoC 2104 - Introduction: Fast k-centers algorithm
Message-ID:
    <CA+rAH6c0tB8+kYzGX=fSpBTKdXrsCJztfZ7yHALMheG6dkbiDQ at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hello Everybody,

I am a PhD student at the University of Joseph Fourier, Grenoble, France. I
am passionate about medical image processing and machine learning. And it
so happens that my research work is also evolved around these two aspects.

I am hoping to code my summer away by contributing to mlpack by tackling
the challenge put forth by Bill March "Fast k-centers algorithm and
implementation".

Though you find me introducing myself to the mlpack community very late, I
have been investing my time on this project during the last few weeks.
Following Ryan's advices to Wenlin (
https://mailman.cc.gatech.edu/pipermail/mlpack/2014-March/000330.html) I
have downloaded, compiled and installed mlpack successfully. I have also
looked at how mlpack has implemented DTBs and abstracted trees. I am quite
comfortable with the concept of DTBs having gone through the first two
chapters of the Bill's thesis, his paper "Fast Euclidean Minimum Spanning
Tree: Algorithm, Analysis, and Applications" and the two presentations from
NIPS 2007 and KDD 2010.

Though, I am still vague on how the concept can be applied to k-centers
algorithm (and as the time is running out fast) I am hoping that you will
be able to point me in the right direction with your valuable advice. Being
very competent in C++, computation geometry and having a fair knowledge on
graph theory and machine learning, I am confident that I will be able to
render my services to mlpack community.

Thank you,
Warm Regards,
Prasad.

-- 
prasnuts.free.fr - je suis qui je suis!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140319/8aed8fe2/attachment-0001.html>

------------------------------

_______________________________________________
mlpack mailing list
mlpack at cc.gatech.edu
https://mailman.cc.gatech.edu/mailman/listinfo/mlpack

End of mlpack Digest, Vol 13, Issue 33
**************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140319/a4ae8dc2/attachment-0002.html>