mlpack.local_coordinate_coding

local_coordinate_coding(...)Local Coordinate Coding

>>> from mlpack import local_coordinate_coding

An implementation of Local Coordinate Coding (LCC), which codes data that approximately lives on a manifold using a variation of l1-norm regularized sparse coding. Given a dense data matrix X with n points and d dimensions, LCC seeks to find a dense dictionary matrix D with k atoms in d dimensions, and a coding matrix Z with n points in k dimensions. Because of the regularization method used, the atoms in D should lie close to the manifold on which the data points lie.

The original data matrix X can then be reconstructed as D * Z. Therefore, this program finds a representation of each point in X as a sparse linear combination of atoms in the dictionary D.

The coding is found with an algorithm which alternates between a dictionary step, which updates the dictionary D, and a coding step, which updates the coding matrix Z.

To run this program, the input matrix X must be specified (with -i), along with the number of atoms in the dictionary (-k). An initial dictionary may also be specified with the 'initial_dictionary' parameter. The l1-norm regularization parameter is specified with the 'lambda_' parameter. For example, to run LCC on the dataset 'data' using 200 atoms and an l1-regularization parameter of 0.1, saving the dictionary 'dictionary' and the codes into 'codes', use

>>> output = local_coordinate_coding(training=data, atoms=200, lambda_=0.1)

>>> dict = output['dictionary']

>>> codes = output['codes']

The maximum number of iterations may be specified with the 'max_iterations' parameter. Optionally, the input data matrix X can be normalized before coding with the 'normalize' parameter.

An LCC model may be saved using the 'output_model' output parameter. Then, to encode new points from the dataset 'points' with the previously saved model 'lcc_model', saving the new codes to 'new_codes', the following command can be used:

>>> output = local_coordinate_coding(input_model=lcc_model, test=points)

>>> new_codes = output['codes']

## input options

- atoms (int): Number of atoms in the dictionary. Default value 0.
- copy_all_inputs (bool): If specified, all input parameters will be deep copied before the method is run. This is useful for debugging problems where the input parameters are being modified by the algorithm, but can slow down the code.
- initial_dictionary (numpy matrix or arraylike, float dtype): Optional initial dictionary.
- input_model (mlpack.LocalCoordinateCodingType): Input LCC model.
- lambda_ (float): Weighted l1-norm regularization parameter. Default value 0.
- max_iterations (int): Maximum number of iterations for LCC (0 indicates no limit). Default value 0.
- normalize (bool): If set, the input data matrix will be normalized before coding.
- seed (int): Random seed. If 0, 'std::time(NULL)' is used. Default value 0.
- test (numpy matrix or arraylike, float dtype): Test points to encode.
- tolerance (float): Tolerance for objective function. Default value 0.01.
- training (numpy matrix or arraylike, float dtype): Matrix of training data (X).
- verbose (bool): Display informational messages and the full list of parameters and timers at the end of execution.

## output options

The return value from the binding is a dict containing the following elements:

- codes (numpy matrix, float dtype): Output codes matrix.
- dictionary (numpy matrix, float dtype): Output dictionary matrix.
- output_model (mlpack.LocalCoordinateCodingType): Output for trained LCC model.