[mlpack] GSoC - 2013 Collaborative Filtering - Introduction and Initial thoughts

Mudit Gupta mudit.raaj.gupta at gmail.com
Sat Apr 20 03:32:25 EDT 2013


HI Ryan,

Thanks for the reply. It looks then QUIC-SVD and ALS-WL look good. I'll
start my work on the application soon. I just wanted to know if there is an
application template provided by mlpack or is it up to the student to think
about the details and put up an application?

Best Regards,

Mudit Raj Gupta


On Wed, Apr 17, 2013 at 10:12 PM, Ryan Curtin <gth671b at mail.gatech.edu>wrote:

> On Wed, Apr 17, 2013 at 04:53:02AM +0530, Mudit Gupta wrote:
> > Hi Ryan, Ajinkya
> >
> > Thank you for taking the time out to answer my questions. I am sorry for
> a
> > late reply, I was travelling.
> >
> > As it is pointed out by you guys earlier the project is a research +
> > implementation project. I was going through some
> > of the literature available on collaborating filtering and also some open
> > source implementations. It looks like the three best implementations
> > available are:
> >
> > 1. GraphLab[1] as suggested by Ajinkya
> > 2. GraphChi[2] is also by the same author as 1
> > 3. Apache Mahaout[3]
> >
> > I was also going through the algorithms implemented in these libraries.
> One
> > algorithms which is implemented in most of the
> > collaborative filtering packages is Alternating Least Square (ALS) with
> > weighed lambda regularization [4]. It seems like a good algorithm to
> start
> > coding. I think it looks like a definite choice simply because it is
> > implemented in all the libraries and can be used for benchmarking. The
> > paper pointed out by Ryan[5] has a SVD based approach and I think a
> similar
> > implementation is in GraphLab. I also came accross some collaborative
> > filtering algorithms which used hmm, knn and other similar algorithms
> which
> > are not always generic. It would be great to know your views on these
> > algorithms. Moreover, I will try to post a review soon. As far as the
> > text-numeric value data mapping is concerned, it looks like a smaller
> issue
> > than selection of algorithms.
>
> Yeah; I have a script that does text->numeric value mapping already;
> it's not a difficult challenge.
>
> The nice thing about implementing QUIC-SVD would be that mlpack already
> has a robust tree framework, so we just have to adapt it to cosine trees
> and from there it shouldn't be hard.
>
> The three packages you suggested are the standard packages that people
> will go to for algorithms like this.  So for us to implement this, we
> should make sure that we have something that those libraries don't -- a
> flexible API.  Using templates we can write a modular ALS-WL
> implementation which allows researchers to plug in different components.
> One example of this is our NMF implementation, which allows a developer
> to write their own simple update rules.
>
> > It would be good to know around how many algorithm implementation is
> > desired during the summer? It is too early to estimate but from what it
> > looks to me 2 thoughly tested and well documented algorithms would take
> > around 6-7 weeks + 1-2 weeks buffer + 3-4 weeks for the designing the
> > system getting it verified and iterating for correction from mentor and
> the
> > community and implementing basic features like ratings or input/output
> > formats. May be have 1 algorithm in "If  time permits section". (I am
> just
> > asking this because I want my proposal neither to be over ambitious and
> nor
> > insufficient work for the summer.).
>
> I think that is reasonable.  The students I've worked with in the past
> have worked about on that timeframe.
>
> Ryan
>
> --
> Ryan Curtin       | "I am the luckiest man alive!"
> ryan at igglybob.com |   - General Borzov
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20130420/4c680a5a/attachment-0003.html>


More information about the mlpack mailing list