mlpack

🔗 NaiveBayesClassifier

The NaiveBayesClassifier implements a trivial Naive Bayes classifier for numerical data. The class offers standard classification functionality. Naive Bayes is useful for multi-class classification (i.e. classes are 0, 1, 2, etc.), and due to its simplicity scales well to large-data scenarios.

Simple usage example:

// Train a Naive Bayes classifier on random data and predict labels:

// All data and labels are uniform random; 5 dimensional data, 4 classes.
// Replace with a data::Load() call or similar for a real application.
arma::mat dataset(5, 1000, arma::fill::randu); // 1000 points.
arma::Row<size_t> labels =
    arma::randi<arma::Row<size_t>>(1000, arma::distr_param(0, 3));
arma::mat testDataset(5, 500, arma::fill::randu); // 500 test points.

mlpack::NaiveBayesClassifier nbc;       // Step 1: create model.
nbc.Train(dataset, labels, 4);          // Step 2: train model.
arma::Row<size_t> predictions;
nbc.Classify(testDataset, predictions); // Step 3: classify points.

// Print some information about the test predictions.
std::cout << arma::accu(predictions == 2) << " test points classified as class "
    << "2." << std::endl;

More examples...

See also:

🔗 Constructors




Constructor Parameters:

name type description default
data arma::mat Column-major training matrix. (N/A)
labels arma::Row<size_t> Training labels, between 0 and numClasses - 1 (inclusive). Should have length data.n_cols. (N/A)
numClasses size_t Number of classes in the dataset. (N/A)
incremental bool If true, then the model will not be reset before training, and will use a robust incremental algorithm for variance computation. true
epsilon double Initial small value for sample variances, to prevent underflow (via log(0)). 1e-10

As an alternative to passing the epsilon parameter, it can be set with the standalone Epsilon() method: nbc.Epsilon() = eps; will set the value of epsilon to eps for the next time non-incremental Train() or Reset() is called.

🔗 Training

If training is not done as part of the constructor call, it can be done with the Train() function:


name type description default
point arma::vec Column-major training point (i.e. one column). (N/A)
label size_t Training label, in range 0 to numClasses. (N/A)

Note: when performing incremental training, if data has a different dimensionality than the model, or if numClasses is different, the model will be reset. For single-point Train(), if point has different dimensionality, an exception will be thrown.

🔗 Classification

Once a NaiveBayesClassifier model is trained, the Classify() member function can be used to make class predictions for new data.





Classification Parameters:

usage name type description
single-point point arma::vec Single point for classification.
single-point prediction size_t& size_t to store class prediction into.
single-point probabilitiesVec arma::vec& arma::vec& to store class probabilities into; will have length 2.
       
multi-point data arma::mat Set of column-major points for classification.
multi-point predictions arma::Row<size_t>& Vector of size_ts to store class prediction into; will be set to length data.n_cols.
multi-point probabilities arma::mat& Matrix to store class probabilities into (number of rows will be equal to 2; number of columns will be equal to data.n_cols).

🔗 Other Functionality

🔗 Simple Examples

See also the simple usage example for a trivial usage of the NaiveBayesClassifier class.


Train a Naive Bayes classifier incrementally, one point at a time, then compute accuracy on a test set and save the model to disk.

// See https://datasets.mlpack.org/mnist.train.csv.
arma::mat dataset;
mlpack::data::Load("mnist.train.csv", dataset, true);
// See https://datasets.mlpack.org/mnist.train.labels.csv.
arma::Row<size_t> labels;
mlpack::data::Load("mnist.train.labels.csv", labels, true);

mlpack::NaiveBayesClassifier nbc(dataset.n_rows /* dimensionality */,
                                 10 /* numClasses */);

// Iterate over all points in the dataset and call Train() on each point.
for (size_t i = 0; i < dataset.n_cols; ++i)
  nbc.Train(dataset.col(i), labels[i]);

// Now compute the accuracy of the fully trained model on a test set.

// See https://datasets.mlpack.org/mnist.test.csv.
arma::mat testDataset;
mlpack::data::Load("mnist.test.csv", testDataset, true);
// See https://datasets.mlpack.org/mnist.test.labels.csv.
arma::Row<size_t> testLabels;
mlpack::data::Load("mnist.test.labels.csv", testLabels, true);

arma::Row<size_t> predictions;
nbc.Classify(dataset, predictions);
const double trainAccuracy = 100.0 *
    ((double) arma::accu(predictions == labels)) / labels.n_elem;
std::cout << "Accuracy of model on training data: " << trainAccuracy << "\%."
    << std::endl;

nbc.Classify(testDataset, predictions);

const double testAccuracy = 100.0 *
    ((double) arma::accu(predictions == testLabels)) / testLabels.n_elem;
std::cout << "Accuracy of model on test data:     " << testAccuracy << "\%."
    << std::endl;

// Save the model to disk with the name "nbc".
mlpack::data::Save("nbc_model.bin", "nbc", nbc, true);

Load a saved Naive Bayes classifier and print some information about it.

mlpack::NaiveBayesClassifier nbc;

// Load the model named "nbc" from "nbc_model.bin".
mlpack::data::Load("nbc_model.bin", "nbc", nbc, true);

// Print information about the model.
std::cout << "The dimensionality of the model in nbc_model.bin is "
    << nbc.Means().n_rows << "." << std::endl;
std::cout << "The number of classes in the model is "
    << nbc.Probabilities().n_elem << "." << std::endl;
std::cout << "The model was trained on " << nbc.TrainingPoints() << " points."
    << std::endl;
std::cout << "The prior probabilities of each class are: "
    << nbc.Probabilities().t();

// Compute the class probabilities of a random point.
// For our random point, we'll use one of the means plus some noise.
arma::vec randomPoint = nbc.Means().col(2) +
    10.0 * arma::randu<arma::vec>(nbc.Means().n_rows);

size_t prediction;
arma::vec probabilities;
nbc.Classify(randomPoint, prediction, probabilities);

std::cout << "Random point class prediction: " << prediction << "."
    << std::endl;
std::cout << "Random point class probabilities: " << probabilities.t();

See also the following fully-working examples:

🔗 Advanced Functionality: Different Element Types

The NaiveBayesClassifier class has one template parameter that can be used to control the element type of the model. The full signature of the class is:

NaiveBayesClassifier<ModelMatType>

ModelMatType specifies the type of matrix used for training data and internal representation of model parameters.

The example below trains a Naive Bayes model on sparse 32-bit floating point data, but uses dense 32-bit floating point matrices to store the model itself.

// Create random, sparse 100-dimensional data, with 3 classes.
arma::sp_fmat dataset;
dataset.sprandu(100, 5000, 0.3);
arma::Row<size_t> labels =
    arma::randi<arma::Row<size_t>>(5000, arma::distr_param(0, 2));

mlpack::NaiveBayesClassifier<arma::fmat> nbc(dataset, labels, 3);

// Now classify a test point.
arma::sp_fvec point;
point.sprandu(100, 1, 0.3);

size_t prediction;
arma::fvec probabilitiesVec;
nbc.Classify(point, prediction, probabilitiesVec);

std::cout << "Prediction for random test point: " << prediction << "."
    << std::endl;
std::cout << "Class probabilities for random test point: "
    << probabilitiesVec.t();

Note: dense objects should be used for ModelMatType, since in general the mean and sample variance of sparse data is dense.