[mlpack] GSOC 2016 "Alternatives to neighbour-based collaborative filtering"

Thu Mar 10 08:32:56 EST 2016

On Thu, Mar 10, 2016 at 12:52:52AM +0530, Devang Kulshreshtha wrote:
> Hello Ryan,
> I'm Devang Kulshreshtha, a sophomore at Computer Science Department
> IIT(BHU) Varanasi, India.
> 
> This semester, I'm doing my undergraduate project which is focused on
> detection of breast cancer in a Mammogram using GLCM Matrix and RBM-Neural
> Network and building an image retrieval system which would return similar
> images in Mammogram.
> In the past semester I had implemeted one research paper(based on streaming
> data fuzzy c-means clustering) and coded them in R.
> I have been programming from the last 3 years in C,C++,python .
> 
> I am very much excited in the project "Alternatives to neighbour-based
> collaborative filtering". From the past 7-8 of days I have been studying
> the code at src/mlpack/methods/cf/,
> src/mlpack/methods/amf/,src/mlpack/methods/quick_svd/. I have also studied
> various research papers related to matrix factorization. Some of the
> contributions and changes I think we can implement are -:
> The learning algorithms for amf are modifications of gradient descent based
> updates. However as the training matrix in this problem is generally huge,
> it would be nicer to have the SGD update also implemented. This would work
> even faster as the matrix generally is very sparse and we would have to
> loop over lesser ratings.
> 
> Similarly regularized_svd is implemented in SGD. We could add the
> Alternating least squares approach to this algorithm as well.
> 
> If we add the bias approximation to user-item interaction, then it could
> increase the RMSE performance. Bias can be added by calculating the
> standard deviation of user(user bias) and items(item bias) from their
> averages respectively.
> 
> Probably all ratings should not deserve same weight. Hence if we can learn
> some confidence levels for each rating, we could modify our cost function
> accordingly by multiplying it with the observed ratings. The learning of
> these weights from the matrix could be implemented by using implicit
> feedback, as described in this paper <http://yifanhu.net/PUB/cf.pdf> .
> 
> I am also studying the weighted nearest neighbour implementation of
> collaborative filtering through this paper on your ideas page<
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.4662&rep=rep1&type=pdf
> >.
> 
> Is there any work that you think I should do in order to demonstrate the
> ideas.I could write a pseudo-code describing the equations to implement
> these changes if you think. I would also be very happy to fix a bug related
> to this project in the existing codebase.

Hi Devang,

Unfortunately I don't think that there are any open bugs relating to the
CF code right now, but one contribution that's always useful is to find
a way to accelerate the current code.  You might try running it with
maybe one of the larger GroupLens datasets and then profiling to see
where it is slow, and seeing if you can speed it up.

The ideas you've proposed for bias approximation and weighted ratings
would be interesting, but probably the most important thing is figuring
out how this would change the API and how they would be implemented.

Also, you might want to take a look at the alternating least squares
rules for NMF in the AMF code; that might be generalizable to the case
of SVD (instead of NMF).

I hope this is helpful; let me know if I can clarify anything.

THanks,

Ryan

-- 
Ryan Curtin    | "He's a peculiar man.  You could even say that he
ryan at ratml.org | has principles."  - Carson Wells