[mlpack] sparse coding test examples in mlpack

Mon Jun 8 22:32:04 EDT 2015

On Fri, Jun 05, 2015 at 05:42:12PM -0700, Jianyu Huang wrote:
> Hi Ryan,
> 
> Thanks so much for the reply! It really helps!
> 
> 1.
> The output I am getting is like the following:
> [0;36m[DEBUG] RA Search  [0x7ffe1d9e5740]
> [DEBUG]   Reference Set: 40x40
> [DEBUG]   Metric:
> [DEBUG]     LMetric [0x7ffe1d9e58a4]
> [DEBUG]       Power: 2
> [DEBUG]       TakeRoot: false
> [DEBUG] Sparse Coding  [0x7ffe1d9e5780]
> [DEBUG]   Data: 40x40
> [DEBUG]   Atoms: 3
> [DEBUG]   Lambda 1: 0.1
> [DEBUG]   Lambda 2: 0
>
> But just curious, what is the the "40x40" input data shown in the summary
> part?

Ah, this is output from src/mlpack/tests/to_string_test.cpp, which just
makes sure that the ToString() method works for every mlpack class.  You
can ignore the output, and the 40x40 dataset is just a random dataset.

> 3.
> Thanks! But just be curious, if I set the data as some random matrix like
> 1,0,0,0
> 0,3,0,0
> 3,0,1,0
> 0,4,0,0
> 0,0,5,0
> 0,0,3,7
> 
> and I run "./sparse_coding -i data_bak2.csv -k 6 -l 1 -d dict.csv -c
> codes.csv -n 10 -v" multiple times.
> 
> Sometimes I can get output smoothly, but sometimes I get the following
> error:
> 
> -------------------------------------------------------------------------------------------------------
> [DEBUG] Newton Method iteration 49:
> [DEBUG]   Gradient norm: 1.94598.
> [DEBUG]   Improvement: 0.
> [INFO ]   Objective value: 27.9256.
> [INFO ] Performing coding step...
> [DEBUG] Optimization at point 0.
> [INFO ]   Sparsity level: 22.2222%.
> [INFO ]   Objective value: 20.6886 (improvement 1.79769e+308).
> [INFO ] Iteration 2 of 10.
> [INFO ] Performing dictionary step...
> [WARN ] There are 1 inactive atoms. They will be re-initialized randomly.
> [DEBUG] Solving Dual via Newton's Method.
> 
> error: solve(): solution not found
> 
> terminate called after throwing an instance of 'std::runtime_error'
>   what():  solve(): solution not found
> Aborted (core dumped)
> 
>  ------------------------------------------------------------------------------------------------------
> 
> Do you have any insights about what is wrong here?

I didn't write the sparse coding module, so I've CC'ed Nishant (the
author) here to see if he has any insights.  To me, it looks like one or
these two systems is failing to be solved:

sparse_coding_impl.hpp:216 -- arma::mat matAInvZXT = solve(A, codesXT);
sparse_coding_impl.hpp:223 -- arma::vec searchDirection = -solve(hessian, gradient);

You can make the behavior deterministic by setting the random seed using
the --seed option.

> 4.
> It looks like mlpack only implements a naive way to solve sparse coding,
> i.e. using Cholesky-based implementation of the LARS-Lasso algorithm to
> solve sparse coding step, and using Newton's iterative method to solve
> Lagrange's Dual. So mlpack doesn't actually implement the feature-sign
> search algorithm of Honglak Lee's "Efficient sparse coding algorithms"
> (NIPS 2006) paper. Am I wrong here? Also, it looks like for online sparse
> coding algorithm, the implementation in Julien Mairal's "Online Dictionary
> Learning for Sparse Coding" (ICML 2009) paper is more efficient, which is
> adopted in Scikit. Do you have plans to add those sparse coding approach?

My understanding was that the mlpack method does implement the
feature-sign search algorithm -- that is what the code references.
Perhaps Nishant can elaborate?

> 5.
> I also notice the parallel performance of Sparse Coding in mlpack. When I
> run command line interface "./sparse_coding ...", it looks like only one
> core is utilized. But when I run the API code, it looks like the quad cores
> in my CPU are all utilized. But searching the whole package, I didn't see
> any "openmp" or "pthread" key words. My guess is that the performance
> benefit comes from parallel MKL/BLAS. Am I wrong here? Do you have any idea
> about why I get different parallel performance for CLI and API?

This would have to do with your BLAS implementation, yes.  It looks like
you are linking against a parallel BLAS when you use the mlpack API, but
you have not used parallel BLAS when you configured and built mlpack.

-- 
Ryan Curtin    | "Weeee!"
ryan at ratml.org |   - Bobby