[mlpack] GSOC 2016 "Alternatives to neighbour-based collaborative filtering"

Wed Mar 9 14:22:52 EST 2016

Hello Ryan,
I'm Devang Kulshreshtha, a sophomore at Computer Science Department
IIT(BHU) Varanasi, India.

This semester, I'm doing my undergraduate project which is focused on
detection of breast cancer in a Mammogram using GLCM Matrix and RBM-Neural
Network and building an image retrieval system which would return similar
images in Mammogram.
In the past semester I had implemeted one research paper(based on streaming
data fuzzy c-means clustering) and coded them in R.
I have been programming from the last 3 years in C,C++,python .

I am very much excited in the project "Alternatives to neighbour-based
collaborative filtering". From the past 7-8 of days I have been studying
the code at src/mlpack/methods/cf/,
src/mlpack/methods/amf/,src/mlpack/methods/quick_svd/. I have also studied
various research papers related to matrix factorization. Some of the
contributions and changes I think we can implement are -:
The learning algorithms for amf are modifications of gradient descent based
updates. However as the training matrix in this problem is generally huge,
it would be nicer to have the SGD update also implemented. This would work
even faster as the matrix generally is very sparse and we would have to
loop over lesser ratings.

Similarly regularized_svd is implemented in SGD. We could add the
Alternating least squares approach to this algorithm as well.

If we add the bias approximation to user-item interaction, then it could
increase the RMSE performance. Bias can be added by calculating the
standard deviation of user(user bias) and items(item bias) from their
averages respectively.

Probably all ratings should not deserve same weight. Hence if we can learn
some confidence levels for each rating, we could modify our cost function
accordingly by multiplying it with the observed ratings. The learning of
these weights from the matrix could be implemented by using implicit
feedback, as described in this paper <http://yifanhu.net/PUB/cf.pdf> .

I am also studying the weighted nearest neighbour implementation of
collaborative filtering through this paper on your ideas page<
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.4662&rep=rep1&type=pdf
>.

Is there any work that you think I should do in order to demonstrate the
ideas.I could write a pseudo-code describing the equations to implement
these changes if you think. I would also be very happy to fix a bug related
to this project in the existing codebase.

Thanks !!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20160310/ec0adb38/attachment-0002.html>