mlpack.nbc

nbc(...)Parametric Naive Bayes Classifier

>>> from mlpack import nbc

This program trains the Naive Bayes classifier on the given labeled training set, or loads a model from the given model file, and then may use that trained model to classify the points in a given test set.

The training set is specified with the 'training' parameter. Labels may be either the last row of the training set, or alternately the 'labels' parameter may be specified to pass a separate matrix of labels.

If training is not desired, a pre-existing model may be loaded with the 'input_model' parameter.

The 'incremental_variance' parameter can be used to force the training to use an incremental algorithm for calculating variance. This is slower, but can help avoid loss of precision in some cases.

If classifying a test set is desired, the test set may be specified with the 'test' parameter, and the classifications may be saved with the 'output' output parameter. If saving the trained model is desired, this may be done with the 'output_model' output parameter.

For example, to train a Naive Bayes classifier on the dataset 'data' with labels 'labels' and save the model to 'nbc_model', the following command may be used:

>>> output = nbc(training=data, labels=labels)

>>> nbc_model = output['output_model']

Then, to use 'nbc_model' to predict the classes of the dataset 'test_set' and save the predicted classes to 'predictions', the following command may be used:

>>> output = nbc(input_model=nbc_model, test=test_set)

>>> predictions = output['output']

## input options

- copy_all_inputs (bool): If specified, all input parameters will be deep copied before the method is run. This is useful for debugging problems where the input parameters are being modified by the algorithm, but can slow down the code.
- incremental_variance (bool): The variance of each class will be calculated incrementally.
- input_model (mlpack.NBCModelType): Input Naive Bayes model.
- labels (numpy vector or array, int/long dtype): A file containing labels for the training set.
- test (numpy matrix or arraylike, float dtype): A matrix containing the test set.
- training (numpy matrix or arraylike, float dtype): A matrix containing the training set.
- verbose (bool): Display informational messages and the full list of parameters and timers at the end of execution.

## output options

The return value from the binding is a dict containing the following elements:

- output (numpy vector, int dtype): The matrix in which the predicted labels for the test set will be written.
- output_model (mlpack.NBCModelType): File to save trained Naive Bayes model to.
- output_probs (numpy matrix, float dtype): The matrix in which the predicted probability of labels for the test set will be written.