mlpack.random_forest

random_forest(...)
Random forests

>>> from mlpack import random_forest

This program is an implementation of the standard random forest classification algorithm by Leo Breiman. A random forest can be trained and saved for later use, or a random forest may be loaded and predictions or class probabilities for points may be generated.

The training set and associated labels are specified with the 'training' and 'labels' parameters, respectively. The labels should be in the range [0, num_classes - 1]. Optionally, if 'labels' is not specified, the labels are assumed to be the last dimension of the training dataset.

When a model is trained, the 'output_model' output parameter may be used to save the trained model. A model may be loaded for predictions with the 'input_model'parameter. The 'input_model' parameter may not be specified when the 'training' parameter is specified. The 'minimum_leaf_size' parameter specifies the minimum number of training points that must fall into each leaf for it to be split. The 'num_trees' controls the number of trees in the random forest. If 'print_training_accuracy' is specified, the calculated accuracy on the training set will be printed.

Test data may be specified with the 'test' parameter, and if performance measures are desired for that test set, labels for the test points may be specified with the 'test_labels' parameter. Predictions for each test point may be saved via the 'predictions'output parameter. Class probabilities for each prediction may be saved with the 'probabilities' output parameter.

For example, to train a random forest with a minimum leaf size of 20 using 10 trees on the dataset contained in 'data'with labels 'labels', saving the output random forest to 'rf_model' and printing the training error, one could call

>>> output = random_forest(training=data, labels=labels, minimum_leaf_size=20,
       num_trees=10, print_training_accuracy=True)
>>> rf_model = output['output_model']

Then, to use that model to classify points in 'test_set' and print the test error given the labels 'test_labels' using that model, while saving the predictions for each point to 'predictions', one could call

>>> output = random_forest(input_model=rf_model, test=test_set,
       test_labels=test_labels)
>>> predictions = output['predictions']

input options

output options

The return value from the binding is a dict containing the following elements: