Documentation for mlpack
🔗 A fast, flexible machine learning library
mlpack is an intuitive, fast, and flexible header-only C++ machine learning library with bindings to other languages. It aims to provide fast, lightweight implementations of both common and cutting-edge machine learning algorithms.
mlpack’s lightweight C++ implementation makes it ideal for deployment, and it can also be used for interactive prototyping via C++ notebooks (these can be seen in action on mlpack’s homepage).
In addition to its powerful C++ interface, mlpack also provides command-line programs, and bindings to the Python, R, Julia, and Go languages.
If you use mlpack, please cite the software.
🔗 mlpack basics
Installing mlpack can be done using the instructions in the README; or the Windows build guide. The following basic guides are highly recommended before using mlpack.
- First steps:
- mlpack C++ quickstart: create a couple simple C++ programs that use mlpack
- Sample Windows mlpack C++ application: create a working mlpack Windows program using Visual Studio
- Basics of matrices and data in mlpack:
- Reference for mlpack core classes:
- Using mlpack natively with our extensions in Python, R, CLI, Julia, and Go:
🔗 mlpack algorithm documentation
Documentation for each machine learning algorithm that mlpack implements is detailed in the sections below.
- Classification algorithms: classify points as
discrete labels (
0
,1
,2
, …). - Regression algorithms: predict continuous values.
- Clustering algorithms: group points into clusters.
- Geometric algorithms: computations based on distance metrics (nearest neighbors, kernel density estimation, etc.).
- Preprocessing utilities: prepare data for machine learning algorithms.
- Transformations: transform data from one space to another (principal components analysis, etc.).
- Modeling utilities: cross-validation, hyperparameter tuning, etc.
🔗 Classification algorithms
Classify points as discrete labels (0
, 1
, 2
, …).
AdaBoost
: Adaptive BoostingDecisionTree
: ID3-style decision tree classifierHoeffdingTree
: streaming/incremental decision tree classifierLinearSVM
: simple linear support vector machine classifierLogisticRegression
: L2-regularized logistic regression (two-class only)NaiveBayesClassifier
: simple multi-class naive Bayes classifierPerceptron
: simple Perceptron classifierRandomForest
: parallelized random forest classifierSoftmaxRegression
: L2-regularized softmax regression (i.e. multi-class logistic regression)
🔗 Regression algorithms
Predict continuous values.
BayesianLinearRegression
: Bayesian L2-penalized linear regressionDecisionTreeRegressor
: ID3-style decision tree regressorLARS
: Least Angle Regression (LARS), L1-regularized and L2-regularizedLinearRegression
: L2-regularized linear regression (ridge regression)
🔗 Clustering algorithms
NOTE: this documentation is still under construction and so some algorithms that mlpack implements are not yet listed here. For now, see the mlpack/methods directory for a full list of algorithms.
Group points into clusters.
MeanShift
: clustering with the density-based mean shift algorithm
🔗 Geometric algorithms
NOTE: this documentation is still under construction and so no geometric algorithms in mlpack are documented yet. For now, see the mlpack/methods directory for a full list of algorithms.
Computations based on distance metrics.
🔗 Preprocessing utilities
Prepare data for machine learning algorithms.
- Normalizing labels: map labels to and from
the range
[0, numClasses - 1]
. - Dataset splitting: split a dataset into a training set and a test set.
NOTE: this documentation is still under construction and so not all preprocessing utilities in mlpack are documented yet. See also the mlpack/methods/preprocess directory for a full list of algorithms.
🔗 Transformations
NOTE: this documentation is still under construction and so some algorithms that mlpack implements are not yet listed here. For now, see the mlpack/methods directory for a full list of algorithms.
Transform data from one space to another.
AMF
: alternating matrix factorizationLocalCoordinateCoding
: local coordinate coding with dictionary learningLMNN
: large margin nearest neighbor (distance metric learning)NCA
: neighborhood components analysis (distance metric learning)NMF
: non-negative matrix factorizationPCA
: principal components analysisRADICAL
: robust, accurate, direct independent components analysis (ICA) algorithmSparseCoding
: sparse coding with dictionary learning
🔗 Modeling utilities
Tools for assembling a full data science pipeline.
- Cross-validation: k-fold cross-validation tools for any mlpack algorithm
- Hyperparameter tuning: generic hyperparameter tuner to find good hyperparameters for any mlpack algorithm
🔗 Bindings to other languages
mlpack’s bindings to other languages have less complete functionality than mlpack in C++, but almost all the same algorithms are available.
Python | – | quickstart | – | reference |
Julia | – | quickstart | – | reference |
R | – | quickstart | – | reference |
Command-line programs | – | quickstart | – | reference |
Go | – | quickstart | – | reference |
🔗 mlpack on embedded systems
mlpack is well suited for embedded systems due to the fact that it is written in C++ and it is header-only with minimal dependencies. In the following, we are adding a set of tutorials to allow you to experiment mlpack on various types of these systems.
🔗 Examples and further documentation
- mlpack examples repository: numerous fully-working example applications of mlpack, in C++ and other languages.
- mlpack models repository: complex models in C++ built with mlpack
For additional documentation beyond what is covered in all the resources above, the source code should be consulted. Each method is fully documented.
🔗 Developer documentation
The following general documentation can be useful if you are interested in contributing to mlpack:
Throughout the codebase, mlpack uses some common template parameter policies. These are documented below.
- The
ElemType
policy: element types for data - The
DistanceType
policy: distance metrics - The
KernelType
policy: kernel functions - The
TreeType
policy: space trees (ball trees, KD-trees, etc.)
In addition, the following documentation may be useful when developing bindings for other languages:
- Timers: timing parts of bindings
- Writing an mlpack binding: simple examples of mlpack bindings
- Automatic bindings: details on mlpack’s automatic binding generator system.
🔗 Changelog
For a list of changes in each version of mlpack, see the changelog.