Profiling for parallelization and parallel stochastic optimization methods - Week 7 & 8

These past two weeks were spent on two things : running HOGWILD! on larger datasets to find performance issues and fixing them and working on an implementation of SCD.

Running parallel SGD on the Netflix datasets revealed a few issues with the code. Using the generalized form of the parallel SGD Optimize function was proving to be very slow with a dataset of this size (with numFunctions = 100198806). Also, shuffling indices was taking more resources than it should.

The first issue was fixed by introducing an overload for ParallelSGD in the regularized SVD implementation, as is done for StandardSGD. To resolve the second issue, arma::shuffle was switched for std::shuffle.

After the latest commit, the runs on the datasets are here.

Comparing these to the runs mentioned in the paper, we see that the time taken on the RCV1 dataset is nearly the same, with lower error. Netflix performance is worse, and has probably to do something with a better initiliazation strategy or using a "delta" model (where ratings are calculated as a difference from the mean rating for the movie, instead of directly multiplying the user and movie matrices).

One interesting thing to note is that the runs in the paper were done on 10 core machines, whereas the runs for mlpack are on a dual-core, quad-thread CPU.

Regarding the implementation of SCD, I am thinking about adding a ResolvableFunctionType requirement on the function iterface. SCD requires the exact computation of gradient in a single dimension, and none of the existing optimizers seem to utilize this type of information.


Atom domain class - Week 7

This week, I was trying to get better structure for the atom domain code. When I was trying to add the full correction update method code for constrained problem in atom domain, I found that I need to rewrite a lot of things in the UpdateSpan class. So I decided to make a new class for atom domain operations. This structure is more general for atom domain optimization, although might make the OMP code less efficient. I will try some large scale tests to see how does that affect the performance.

Since there has been some API change between my old PR and my current code, I guess I will merge everything first. Also, I will make more tests in vector case for atom domain before I do the matrix completion comparison (which was my plan last week).

Even for the vector case, I am still excited to see in the experiments how will the atom norm constraint and support prune step would affect the sparsity of the solution, as well as the convergence speed, comparing with the naive OMP solution. (People familiar with LASSO would know that L1 norm constraint would give sparsity automatically. In terms of the algorithm I implemented, the atom norm constraint problem uses projected gradient solver, where the projection step is simply a soft-thresholding on the atom coefficients. So I guess it will give better solution and faster convergence than OMP.)


FrankWolfe in atom domain for constrained problem - Week 6

As I mentioned in the blog last week, the problem I focused on this week is the FrankWolfe type solver in atom domain for constrained problem. The update step now involves a projected gradient solver onto the l1 ball spanned by previous atoms. According to duchi2008efficient, this is performed efficiently in O(nlog(n)) steps. The implementation of this step is basically done.

I will merge my current code with previous PR, in the coming week. Also, I think I can perform some serious tests on large scale problems. I will try matrix completion tests in the following week. Hopefully, I can get some comparison plots to show.


Profiling for parallelization and parallel stochastic optimization methods - Week 6

This week was spent in further research into testing methods for the implementations in the PR(#1037). Several runs and experiments are in the Gists linked below :

Also, I have started with the initial implementation of another optimizer method, Stochastic Co Ordinate descent, and would open up a PR soon with the additions.


Matrix Completion Solver - Week 5

The work I did this week is mainly about matrix completion using FrankWolfe type solver.

In src/mlpack/methods/matrix_completion/, Stephen already implemented a matrix completion solver with sdp optimizer. However, Stephen suggests that it would be better to use FrankWolfe type optimizer or sgd optimizer for matrix completion. So I tried to add a new template class interface, which could let the users to choose between different optimizers. In fact, I think it could be an interesting project to compare different solvers on the same problem.

Currently, the FrankWolfe type solver for atom domain problem follows OMP fashion. That is, the update step searches the solution in the space spanned by all current atoms. This update step only cares about sparsity of the solution. RaoShahWright uses FrankWolfe type optimizers to solve atom domain problem with solution constrained by bounded atom norm. This would require a more complicated update step, where we need to solve a constrained problem, but it might give better results. I will implement this and compare.


Profiling for parallelization and parallel stochastic optimization methods - Week 5

As mentioned in the earlier blog post, this week was primarily focussed on adding more tests for the implementation of Parallel SGD, along with progress on the matrix completion example from the paper. I have made some improvements to the code based on initial reviews, while adding new parts.

Both the implemented examples from the paper can be added as separate methods to mlpack. Once enough tests are added, I'll run the methods on the datasets from the paper. While waiting for an in depth review on the PR, I'll work on adding more tests for SparseSVM and SparseMC.


Profiling for parallelization and parallel stochastic optimization methods - Week 4

This week I completed the initial implementation of parallel SGD, along with a couple of tests for the new code. The PR (#1037) is nearly ready for an initial review. With further discussion, I would add more sophisticated and detailed methods of testing. Some SFINAE code needs to be added to handle the current interface of DecomposableFunctionType.

I would like to test this implementation against the original runs mentioned in the paper, to see the difference in results obtained from an OpenMP based implementation. I have already implemented the SparseSVM example.

The next few days would be focussed primarily on implementing more tests, running ParallelSGD on the datasets mentioned in the paper and possibly profiling mlpack's implementation of GMM for potential parallelization with OpenMP.


Sparse Optimization with regularization and support truncation - Week 3 and 4

Sorry that I didn't update blog last week. During these two weeks, I was working on techniques to make the FrankWolfe type sparse optimization algorithm to converge faster.

The first technique is to have regularization on the atom domain. The regularization parameter could be optionally provided by the user. Ideally, it should be equal to the l2 norm of the atoms, or the l2 norm of the columns of the sensing matrix. The second technique is the support truncation, which is mentioned in RaoShahWright and BoydSchiebingerRecht.

Now the code is ready to be more seriously tested with larger problems such as matrix completion. I will implement this in the coming week. Also, [RaoShahWright] paper also talks about constrained problem in this setting, which is also interesting to be implemented.


Frank-Wolfe Algorithm for various atom domain - Week-2

My work this week is mainly about adding more solvers for various atom domain constraint in FW algorithm.

For the vector case, the atom domain defined by structured group norm is added. This solver is implemented as a template class, so that users can define their own GroupType class for the way to project to each group, as well as the norm for each group. As a special case, I implemented a GroupType class that the projection to each group is just the restriction of some vector support, and the norm in each group is defined as the lp norm. This is used as Jaggi's paper. (Maybe other structured groups, for example, projection to each group could be extracting information from each frequency band, could be added and tested later, if some people is interested in the future.)

For the matrix case, atom domain defined by different Schatten matrix norms is also implemented.

Also, for the update step of Frank-Wolfe algorithm, a line search solver using secant method is implemented.

Next step could be implementing some tricks to make the algorithm work faster, such as regularization or support prune. Tests of sufficiently large scales should also be performed in the future.


Frank-Wolfe algorithm structure and OMP - Week-1

My project in this summer is initially about implementing the Frank-Wolfe algorithm and Orthogonal Matching Pursuit (OMP) algorithm, and to put them in the same code structure. This week, I built a structure of the FW type algorithm, and tried to make it flexible enough for all the variants in Jaggi's paper. I finished the special case to make it the same as OMP, and a simple hand-made test seems to indicate that the basic version of OMP is working now.

The other variants of FW type algorithm need more solvers, which I will add next. Such as an LP solver, a line search solver, optimizers in various atom domains, etc. I will try to add more of them in the following week.

Also, I am so glad to be coding for mlpack in this summer. It is quite pleasant to meet this cool community ^_^