[mlpack] GSOC 2014: collaborative filtering package improvements

Pulkit Yadav yadavpulkit at gmail.com
Wed Mar 12 10:00:06 EDT 2014


Thanks Mudit and Ryan for your quick responses.

I looked at the QUIC-SVD paper - their algorithm makes use of Cosine trees
based sampling and Monte Carlo error estimation.

@Mudit: I looked at your cosine tree implementation. Is this the only part
of QUIC-SVD algorithm that has been
implemented? Meaning - do the error estimation and the main QUIC-SVD
algorithms still need to be implemented?

@Ryan: Besides changing data matrix to column-major format, could you
please tell what kind of refactoring (or changes) are needed
in the Cosine tree code?

Based on your inputs, this is my understanding of potential task-plan:
1. to refactor Cosine tree code (including changing row-major to
column-major matrix in both main class & tests).
2. to finish the QUIC-SVD algorithm implementation (if not currently
complete).
3. to look at other alternative good algorithms and implement them.

I am currently looking at other good candidate algorithms for decomposition
besides ALS and QUIC-SVD. Should I also look into
neighbourhood algorithms? Currently, allKNN is the only algorithm for this
job and as pointed out by Ryan in ticket
306<http://www.mlpack.org/trac/ticket/306>,
it has issues
like having a potentially large memory footprint.

Please feel free to suggest ideas for improvement for the project.

Thanks,
Pulkit



On Wed, Mar 12, 2014 at 5:53 PM, Mudit Gupta <mudit.raaj.gupta at gmail.com>wrote:

>
>
>
> On Wed, Mar 12, 2014 at 5:44 PM, Ryan Curtin <gth671b at mail.gatech.edu>wrote:
>
>> On Wed, Mar 12, 2014 at 01:29:48PM +0530, Mudit Gupta wrote:
>> > Hi,
>> >
>> > Thank you for your mail. I have modified the project wiki accordingly.
>> >
>> > A part of QUIC-SVD has been implemented. You can check out the cosine
>> tree
>> > implementation here
>> >
>> https://trac.research.cc.gatech.edu/fastlab/browser/tags/mlpack-1.0.8/src/mlpack/core/tree/cosine_tree?rev=16133
>> .
>> > In QUIC-SVD you have to construct the cosine tree first before SVD.
>> Please
>> > go though the paper for details.
>> >
>> > You can discuss the algorithms which you plan to implement and how would
>> > you architect the solution. You can look into extending the executable
>> for
>> > cf but I think Ryan and Ajinkya would be able to guide you on this.
>>
>> The cosine tree implementation there is a good start, but it isn't
>> tested and needs some refactoring; currently, I think it assumes a
>> row-major data matrix, but data in mlpack is column-major.  It would be
>> a helpful start to the project, though.
>>
>
> As Ryan mentioned, the present code is row-major. Also, cosine trees were
> tested assuming row-major data matrix.
> The tests for cosine trees were added in this:
>
>
> https://trac.research.cc.gatech.edu/fastlab/browser/tags/mlpack-1.0.8/src/mlpack/tests/tree_test.cpp?rev=16133
>
> The code needs refactoring and changes from row-major to column major both
> in the main class and in the tests.
> @Ryan will be able to take  a call if this would be required to start the
> QUIC-SVD implementation.
>
>
>>
>> --
>> Ryan Curtin    | "I was misinformed."
>> ryan at ratml.org |   - Rick Blaine
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140312/d47ca92c/attachment-0003.html>


More information about the mlpack mailing list