These past two weeks were spent on two things : running HOGWILD! on larger datasets to find performance issues and fixing them and working on an implementation of SCD.
Running parallel SGD on the Netflix datasets revealed a few issues with the code. Using the generalized
form of the parallel SGD
Optimize function was proving to be very slow with a dataset of this size
(with numFunctions = 100198806). Also, shuffling indices was taking more resources than it should.
The first issue was fixed by introducing an overload for ParallelSGD in the regularized SVD implementation,
as is done for StandardSGD. To resolve the second issue,
arma::shuffle was switched for
After the latest commit, the runs on the datasets are here.
Comparing these to the runs mentioned in the paper, we see that the time taken on the RCV1 dataset is nearly the same, with lower error. Netflix performance is worse, and has probably to do something with a better initiliazation strategy or using a "delta" model (where ratings are calculated as a difference from the mean rating for the movie, instead of directly multiplying the user and movie matrices).
One interesting thing to note is that the runs in the paper were done on 10 core machines, whereas the runs for mlpack are on a dual-core, quad-thread CPU.
Regarding the implementation of SCD, I am thinking about adding a
on the function iterface. SCD requires the exact computation of gradient in a single dimension, and none
of the existing optimizers seem to utilize this type of information.