mlpack_pca

NAME

mlpack_pca - principal components analysis

SYNOPSIS

mlpack_pca [-h] [-v]

DESCRIPTION

This program performs principal components analysis on the given dataset using the exact, randomized, randomized block Krylov, or QUIC SVD method. It will transform the data onto its principal components, optionally performing dimensionality reduction by ignoring the principal components with the smallest eigenvalues.

To specify the dataset to perform PCA on, the ’input.csv’ parameter may be used. A desired new dimensionality may be specified with the ’--new_dimensionality (-d)’ parameter, or the desired variance to retain may be specified with the ’--var_to_retain (-r)’ parameter. If desired, the dataset may be scaled before running PCA with the ’--scale (-s)’ parameter.

Multiple different decomposition techniques may be used. The method to use may be specified with the ’--decomposition_method (-c)’ parameter, and it may take the values ’exact’, ’randomized’, or ’quic’.

For example, to reduce the dimensionality of the matrix ’data.csv’ to 5 dimensions using randomized SVD for the decomposition, storing the output matrix to ’data_mod.csv’, the following command may be used:

$ pca --input_file data.csv --new_dimensionality 5 --decomposition_method randomized --output_file data_mod.csv

REQUIRED INPUT OPTIONS

--input_file (-i) [string]

Input dataset to perform PCA on.

OPTIONAL INPUT OPTIONS

--decomposition_method (-c) [string] Method used for the principal components analysis: ’exact’, ’randomized’, ’randomized-block-krylov’, ’quic’. Default value ’exact’.
--help (-h) [bool]

Default help info.

--info [string]

Get help on a specific module or option. Default value ’’. --new_dimensionality (-d) [int] Desired dimensionality of output dataset. If 0, no dimensionality reduction is performed. Default value 0.

--scale (-s) [bool]

If set, the data will be scaled before running PCA, such that the variance of each feature is

1.

--var_to_retain (-r) [double] Amount of variance to retain; should be between 0 and 1. If 1, all variance is retained. Overrides -d. Default value 0.

--verbose (-v) [bool]

Display informational messages and the full list of parameters and timers at the end of execution.

--version (-V) [bool]

Display the version of mlpack.

OPTIONAL OUTPUT OPTIONS

--output_file (-o) [string]

Matrix to save modified dataset to. Default value ’’.

ADDITIONAL INFORMATION

ADDITIONAL INFORMATION

For further information, including relevant papers, citations, and theory, For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your consult the documentation found at http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK. DISTRIBUTION OF MLPACK.