[mlpack] [GSoC2013] Collaborative Filtering Package

Sarthak Kukreti sarthakkukreti at gmail.com
Sat Apr 20 04:14:42 EDT 2013


Hi all

I am a final year undergrad, pursuing my Bachelors in Engineering at NSIT,
Delhi, majoring in Computer Engineering. My main area of research is social
discovery in large scale graphs; I have worked on link prediction in social
network graphs and I mostly use Python or C++ for implementations.

As a part of my undergraduate thesis on recommendation systems, I have been
working on implementing matrix factorization models. I have already
implemented the base model for matrix factorization using stochastic
gradient descent in python as proposed in Yehuda Koren's paper [1]. It's
quite slow, but it achieves an RMSE of 0.98 on the MovieLens dataset [5].
Besides the approaches mentioned in [1], my final thesis involves
implementing Probabilistic Matrix Factorization [2], Bayesian Probabilistic
Tensor factorization [3] and on distributed stochastic gradient descent for
matrix factorization [4] (almost implemented).

I am interested in developing the collaborative engine package for mlpack
and I think quite a lot of my work on my thesis can be subsequently
deployed as a part it. From my current vantage point, the collaborative
engine package would have a group of such models, sample data for testing,
and supporting functions for them like parameter selection, RMSE, plots for
convergence rate, and comparing different models. I would like to discuss
how you would ideally want me to proceed, and how you view the package as a
whole.

I am also attaching the baseline code. Although it's in Python, the final
work I am planning will have a similar structure. I would lke your views on
the structure and quality of code.

Thanks,
Sarthak Kukreti

[1] Yehuda Koren - Collaborative Filtering with Temporal Dynamics :
http://sydney.edu.au/engineering/it/~josiah/lemma/kdd-fp074-koren.pdf
[2] Ruslan Salakhutdinov - Probabilistic Matrix Factorization :
http://www.cs.utoronto.ca/~amnih/papers/pmf.pdf
[3] Liang Xiong - Temporal Collaborative Filtering with Bayesian
Probabilistic Tensor Factorization :
http://www.cs.cmu.edu/~xichen/images/Xi%20Chen%20SDM%202010.pdf
[4] Rainer Gemulla - Large-Scale Matrix Factorization with Distributed
Stochastic Gradient Descent :
http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf
[5] MovieLens 100k Dataset : http://www.grouplens.org/node/73
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20130420/37a43b1b/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mf-sgd.py
Type: application/octet-stream
Size: 2027 bytes
Desc: not available
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20130420/37a43b1b/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ua.base
Type: application/octet-stream
Size: 1792501 bytes
Desc: not available
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20130420/37a43b1b/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ua.result
Type: application/octet-stream
Size: 251789 bytes
Desc: not available
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20130420/37a43b1b/attachment-0010.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ua.test
Type: application/octet-stream
Size: 186672 bytes
Desc: not available
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20130420/37a43b1b/attachment-0011.obj>


More information about the mlpack mailing list