[mlpack] Fwd: Apply for the implementation of the QUIC-SVD collaborative filtering

Mon Mar 24 11:47:01 EDT 2014

Hi Ryan,

Something I forget to say in my last E-mail.

If you have any ideas that can balance API cleanliness and simplicity
> with scalability, I'm all ears.  Making trees work in a distributed
> setting is not an easy task, in general.

You mentions that it is difficult to make the trees work in a distributed
setting. However, I think that we don't need to make the trees work in the
distributed setting. We can first implement the QUIC-SVD and then divide
the matrix into smaller matrixes then map-reduce them!

More details can be found in this paper -- An Iterative
Divide-and-Merge-Based Approach for Solving Large-Scale Least Squares
Problems.

Here is the links:
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6205747

Yours,

Wilson Cao

On Wed, Mar 19, 2014 at 1:40 PM, Wilson Cao <wilsoncao01 at gmail.com> wrote:

> Hi Ryan,
>
> Thanks for your reply, and I really appreciate that.
>
> No.  We have arma::mat (and other numerical matrix data types) as data
>> types, and if we wanted to start supporting other types of features, it
>> takes a lot of overhead and will be slow.  If anything, a transition
>> layer to convert non-numeric categorical features into numerical
>> features is the way to go.
>
>
> Thanks, I have check the Armadillo library and now I know that we should
> add a layer instead of designing some new API. Thanks for your advice.
> Also, I found that in the Armadillo library, the svd() has been
> implemented. Is that what we do is to use the QUIC-SVD to improve the
> performance of the original SVD?
>
> Since in the past I attended in a cf competition, and the competition data
> is not the pure "rate", it is the record of the behavior of the users,like
> purchasing, adding to the cart,etc . It seems necessary that we can design
> the API to let other programmers to define what their "rate" are in the cf.
>
> I'm sorry, but we can't accept late proposals.  If you upload your
>> proposal to Melange (which I think you already have) I will look at it
>> and comment.
>
>
> Thanks for your detailed explanation. I certainly will upload my proposal
> to the Melange, I just can't get my hand dirty in the project until next
> week because of my exam this Sunday. Without trying to implement the
> algorithm, I worry that it would make my proposal very vulnerable. I will
> get my hand in the project as soon as I finish the exam.
>
> Again, thanks for your valuable advice!
>
> Yours,
>
> Wilson Cao
>
>
>
> On Wed, Mar 19, 2014 at 3:37 AM, Ryan Curtin <gth671b at mail.gatech.edu>wrote:
>
>> On Mon, Mar 17, 2014 at 12:55:50PM +0800, Wilson Cao wrote:
>> > Hello,
>> >
>> > My name is Wilson Cao, a Chinese students from South China University of
>> > Technology. I am really interested in the implementation of the QUIC-SVD
>> > collaborative filtering.
>>
>> Hi Wilson,
>>
>> I'm sorry for the slow response.
>>
>> > The most important part of this SVD-based collaborative filtering is
>> the to
>> > implement the svd method to mlpack API. The QUIC-SVD method use the new
>> > data structure -- cosine tree. It is more efficient than the previous
>> Monte
>> > Carlo linear algebra methods.
>>
>> Efficient in what way?
>>
>> > What API can we use to implement the QUIC-SVD algorithm? I think maybe
>> we
>> > should create the abstract class or the template class, and this class
>> > constructor should take the user-item matrix as an input. Also, the
>> > collaborative filtering algorithm should be include in the in this
>> class.
>> >
>> > Sometimes, the rates of the item from the users are not always be the
>> > number, so I think we need to implement a kind of API so that the
>> > programmer can define the type of "rate".
>>
>> No.  We have arma::mat (and other numerical matrix data types) as data
>> types, and if we wanted to start supporting other types of features, it
>> takes a lot of overhead and will be slow.  If anything, a transition
>> layer to convert non-numeric categorical features into numerical
>> features is the way to go.
>>
>> > I really believe that the performance is the key to this algorithm, so
>> I am
>> > wondering if we can use the cluster distributed system to implement is
>> > algorithm? I haven't find out whether this is feasible.
>>
>> If you have any ideas that can balance API cleanliness and simplicity
>> with scalability, I'm all ears.  Making trees work in a distributed
>> setting is not an easy task, in general.
>>
>> > I am really interested in the project! However, I have been in the
>> trouble
>> > that I have my TOEFL exam in Mar 23 (UTC + 8:00), which means that I
>> can't
>> > get myself full prepared for the proposal. I have to apology for my lack
>> > preparation for this project. I am wondering whether I can send the
>> draft
>> > proposal first? I promise I will get full prepared for the project and
>> show
>> > my deep passion on it right after my TOEFL exam.
>>
>> I'm sorry, but we can't accept late proposals.  If you upload your
>> proposal to Melange (which I think you already have) I will look at it
>> and comment.
>>
>> Thanks,
>>
>> Ryan
>>
>> --
>> Ryan Curtin    | "Sometimes, I doubt your commitment to Sparkle
>> ryan at ratml.org | Motion!"  - Kitty Farmer
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140324/e2fc10c7/attachment-0003.html>