[mlpack] GSOC-2013 : Working on Collaborative Filtering

Sun Apr 21 11:59:03 EDT 2013

Hi,

I am immensely interested in working on the mlpack project on
'collaborative filtering'.
My research interests lie in the field of Machine Learning and I will be
pursuing the same field in my Ph.D.
I have significant research experience, including reading, understanding
and implementing research papers and I have published a couple of research
papers in IEEE conferences.
I have relevant coding experiences in C, C++, Python and Java.

Let me first introduce myself.
I am final year undergraduate student at Indian Institute of Technology,
Kharagpur (IIT Kharagpur) and I am joining University of Maryland this fall
for my Ph.D.
I have worked on couple of machine learning based summer interns - one
involved extraction of human silhouettes in videos (at SUNY Buffalo) and
the second one was a location predictor for tweets (at IBM Research,
India), both of which were successfully implemented.
I have two research papers in reputed IEEE conferences, not in the field of
Machine Learning, but nevertheless, it signifies my involvement in research.
My full profile can be found at
https://sites.google.com/site/srijankedia/home.

I did a little bit of background research on CF and found some interesting
papers that are implementable and also very fast.
However, the final decision on the algorithm can only be made after a
thorough search and also depends on the type and size of dataset that we
want to handle.
1.
www.seas.harvard.edu/courses/cs281/papers/goldberg-roeder-gupta-perkins-2001.pdf-
A constant time algorithm, tested on Jester, an online joke
recommending
system.
2. www.stat.osu.edu/~dmsl/Sarwar_2001.pdf - Item based collaborative
filtering recommendation algorithm, which perform better than user based
algorithms, both in terms of speed, accuracy and quality.
3. www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf - Item-to-Item
collaborative filtering algorithm used at Amazon, which is scalable,
realtime and high-quality.

After deciding on the algorithm, we can work our way through building a
modular, high-level recommendation system.
For the testing part, we can see how well the algorithm works, as
illustrated in the paper.
As far as the documentation is concerned, it is the same for all the
projects :-) (extensive and with easy to understand examples)

I do have a few questions that would help to decide the CF algorithm to be
finally implemented.
It would be great if the mentors could please answer the following
questions -
1. What is the kind and size of data that we would require to handle? Do
you have anything in mind or is it general at the moment?
2. What are the other factors that we would need to consider while choosing
the algorithm?

It would be really helpful if the mentors could give more details on the
project and specific requirements, if any.

Thank you and regards,
Srijan Kumar
Final Year Undergraduate Student
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20130421/5e995dc1/attachment-0002.html>