mlpack blog

Alternatives to Neighborhood-Based CF - Week 3

Wenhao Huang, 04 June 2018

During the third week, I first added two more classes for neighbor search: CosineSearch and PearsonSearch, which search for neighbors based on cosine similarity and pearson correlation. Instead of using neighborSearch directly with cosine/pearson distance, reference and query set are first normalized, so that we can use neighbor::KNN (i.e. neighborSearch with Euclidean distance and KDTree). Resulting neighbor similarities are used to calculate interpolation weights. Besides, I was working on the implmentation of RegressionInterpolation from this paper but it is not complete yet. The paper discusses RegressionInterpolation with sparse rating matrix. But as the current CF algorithm can generate a dense rating matrix, the exact implmentation of RegressionInterpolation might be different from that in the paper.

As for CF data normalization, when I was adding accuracy tests for data normalization, I noticed that the bound for prediction accuracy in testing is too large. However, when I set the bound to a reasonably smaller value, some tests would fail. I will spend some time in solving this issue next week.

For the fourth week, I plan to complete RegressionInterpolation, tests for data normalization, and keep working on templatizing some CF methods to take NeighborSearchPolicy and InterpolationPolicy as template parameters.