Data loading and I/O

mlpack provides the data::Load() and data::Save() functions to load and save Armadillo matrices (e.g. numeric and categorical datasets) and any mlpack object via the cereal serialization toolkit. A number of other utilities related to loading and saving data and objects are also available.

Numeric data
Mixed categorical data
- data::DatasetInfo
- Loading categorical data
Image data
- data::ImageInfo
- Loading images
mlpack objects: load or save any mlpack object
Formats: supported formats for each load/save variant

🔗 Numeric data

Numeric data or general numeric matrices can be loaded or saved with the following functions.

data::Load(filename, matrix, fatal=false, transpose=true, format=FileType::AutoDetect)
data::Save(filename, matrix, fatal=false, transpose=true, format=FileType::AutoDetect)
- filename is a std::string with a path to the file to be loaded.
- By default the format is auto-detected based on the file extension, but can be explicitly specified with format; see Formats.
- matrix is an arma::mat&, arma::Mat<size_t>&, arma::sp_mat&, or similar (e.g., a reference to an Armadillo object that data will be loaded into or saved from).
- If fatal is true, a std::runtime_error will be thrown on failure.
- If transpose is true, then for plaintext formats (CSV/TSV/ASCII), the matrix will be transposed on load or save. (Keep this true if you want a column-major matrix to be loaded or saved with points as rows and dimensions as columns; that is generally what is desired.)
- A bool is returned indicating whether the operation was successful.

Example usage:

// See https://datasets.mlpack.org/satellite.train.csv.
arma::mat dataset;
mlpack::data::Load("satellite.train.csv", dataset, true);

// See https://datasets.mlpack.org/satellite.train.labels.csv.
arma::Row<size_t> labels;
mlpack::data::Load("satellite.train.labels.csv", labels, true);

// Print information about the data.
std::cout << "The data in 'satellite.train.csv' has: " << std::endl;
std::cout << " - " << dataset.n_cols << " points." << std::endl;
std::cout << " - " << dataset.n_rows << " dimensions." << std::endl;

std::cout << "The labels in 'satellite.train.labels.csv' have: " << std::endl;
std::cout << " - " << labels.n_elem << " labels." << std::endl;
std::cout << " - A maximum label of " << labels.max() << "." << std::endl;
std::cout << " - A minimum label of " << labels.min() << "." << std::endl;

// Modify and save the data.  Add 2 to the data and drop the last column.
dataset += 2;
dataset.shed_col(dataset.n_cols - 1);
labels.shed_col(labels.n_cols - 1);

mlpack::data::Save("satellite.train.mod.csv", dataset);
mlpack::data::Save("satellite.train.labels.mod.csv", labels);

🔗 Mixed categorical data

Some mlpack techniques support mixed categorical data, e.g., data where some dimensions take only categorical values (e.g. 0, 1, 2, etc.). When using mlpack, string data and other non-numerical data must be mapped to categorical values and represented as part of an arma::mat. Category information is stored in an auxiliary data::DatasetInfo object.

🔗 `data::DatasetInfo`

mlpack represents categorical data via the use of the auxiliary data::DatasetInfo object, which stores information about which dimensions are numeric or categorical and allows conversion from the original category values to the numeric values used to represent those categories.

Constructors

info = data::DatasetInfo()
- Create an empty data::DatasetInfo object.
- Use this constructor if you intend to populate the data::DatasetInfo via a data::Load() call.
info = data::DatasetInfo(dimensionality)
- Create a data::DatasetInfo object with the given dimensionality
- All dimensions are assumed to be numeric (not categorical).

Accessing and setting properties

info.Type(d)
- Get the type (categorical or numeric) of dimension d.
- Returns a data::Datatype, either data::Datatype::numeric or data::Datatype::categorical.
- Calling info.Type(d) = t will set a dimension to type t, but this should only be done before info is used with data::Load() or data::Save().
info.NumMappings(d)
- Get the number of categories in dimension d as a size_t.
- Returns 0 if dimension d is numeric.
info.Dimensionality()
- Return the dimensionality of the object as a size_t.

Map to and from numeric values

info.MapString<double>(value, d)
- Given value (a std::string), return the double representing the categorical mapping (an integer value) of value in dimension d.
- If a mapping for value does not exist in dimension d, a new mapping is created, and info.NumMappings(d) is increased by one.
- If dimension d is numeric and value cannot be parsed as a numeric value, then dimension d is changed to categorical and a new mapping is returned.
info.UnmapString(mappedValue, d)
- Given mappedValue (a size_t), return the std::string containing the original category that mapped to the value mappedValue in dimension d.
- If dimension d is not categorical, a std::invalid_argument is thrown.

🔗 Loading categorical data

With a data::DatasetInfo object, categorical data can be loaded:

data::Load(filename, matrix, info, fatal=false, transpose=true)
- filename is a std::string with a path to the file to be loaded.
- The format is auto-detected based on the extension of the filename and the contents of the file:
  - .csv, .tsv, or .txt for CSV/TSV (tab-separated)/ASCII (space-separated)
  - .arff for ARFF
- matrix is an arma::mat&, arma::Mat<size_t>&, or similar (e.g., a reference to an Armadillo object that data will be loaded into or saved from).
- info is a data::DatasetInfo& object. This will be populated with the category information of the file when loading, and used to unmap values when saving.
- If fatal is true, a std::runtime_error will be thrown on failure.
- If transpose is true, then for plaintext formats (CSV/TSV/ASCII), the matrix will be transposed on save. (Keep this true if you want a column-major matrix to be saved with points as rows and dimensions as columns; that is generally what is desired.)
- A bool is returned indicating whether the operation was successful.

Saving should be performed with the numeric data::Load() variant.

Example usage to load and manipulate an ARFF file.

// Load a categorical dataset.
arma::mat dataset;
mlpack::data::DatasetInfo info;
// See https://datasets.mlpack.org/covertype.train.arff.
mlpack::data::Load("covertype.train.arff", dataset, info, true);

arma::Row<size_t> labels;
// See https://datasets.mlpack.org/covertype.train.labels.csv.
mlpack::data::Load("covertype.train.labels.csv", labels, true);

// Print information about the data.
std::cout << "The data in 'covertype.train.arff' has: " << std::endl;
std::cout << " - " << dataset.n_cols << " points." << std::endl;
std::cout << " - " << info.Dimensionality() << " dimensions." << std::endl;

// Print information about each dimension.
for (size_t d = 0; d < info.Dimensionality(); ++d)
{
  if (info.Type(d) == mlpack::data::Datatype::categorical)
  {
    std::cout << " - Dimension " << d << " is categorical with "
        << info.NumMappings(d) << " categories." << std::endl;
  }
  else
  {
    std::cout << " - Dimension " << d << " is numeric." << std::endl;
  }
}

// Modify the 5th point.  Increment any numeric values, and set any categorical
// values to the string "hooray!".
for (size_t d = 0; d < info.Dimensionality(); ++d)
{
  if (info.Type(d) == mlpack::data::Datatype::categorical)
  {
    // This will create a new mapping if the string "hooray!" does not already
    // exist as a category for dimension d..
    dataset(d, 4) = info.MapString<double>("hooray!", d);
  }
  else
  {
    dataset(d, 4) += 1.0;
  }
}

Example usage to manually create a data::DatasetInfo object.

// This will manually create the following data matrix (shown as it would appear
// in a CSV):
//
// 1, TRUE, "good", 7.0, 4
// 2, FALSE, "good", 5.6, 3
// 3, FALSE, "bad", 6.1, 4
// 4, TRUE, "bad", 6.1, 1
// 5, TRUE, "unknown", 6.3, 0
// 6, FALSE, "unknown", 5.1, 2
//
// Although the last dimension is numeric, we will take it as a categorical
// dimension.

arma::mat dataset(5, 6); // 6 data points in 5 dimensions.
mlpack::data::DatasetInfo info(5);

// Set types of dimensions.  By default they are numeric so we only set
// categorical dimensions.
info.Type(1) = mlpack::data::Datatype::categorical;
info.Type(2) = mlpack::data::Datatype::categorical;
info.Type(4) = mlpack::data::Datatype::categorical;

// The first dimension is numeric.
dataset(0, 0) = 1;
dataset(0, 1) = 2;
dataset(0, 2) = 3;
dataset(0, 3) = 4;
dataset(0, 4) = 5;
dataset(0, 5) = 6;

// The second dimension is categorical.
dataset(1, 0) = info.MapString<double>("TRUE", 1);
dataset(1, 1) = info.MapString<double>("FALSE", 1);
dataset(1, 2) = info.MapString<double>("FALSE", 1);
dataset(1, 3) = info.MapString<double>("TRUE", 1);
dataset(1, 4) = info.MapString<double>("TRUE", 1);
dataset(1, 5) = info.MapString<double>("FALSE", 1);

// The third dimension is categorical.
dataset(2, 0) = info.MapString<double>("good", 2);
dataset(2, 1) = info.MapString<double>("good", 2);
dataset(2, 2) = info.MapString<double>("bad", 2);
dataset(2, 3) = info.MapString<double>("bad", 2);
dataset(2, 4) = info.MapString<double>("unknown", 2);
dataset(2, 5) = info.MapString<double>("unknown", 2);

// The fourth dimension is numeric.
dataset(3, 0) = 7.0;
dataset(3, 1) = 5.6;
dataset(3, 2) = 6.1;
dataset(3, 3) = 6.1;
dataset(3, 4) = 6.3;
dataset(3, 5) = 5.1;

// The fifth dimension is categorical.  Note that `info` will choose to assign
// category values in the order they are seen, even if the category can be
// parsed as a number.  So, here, the value '4' will be assigned category '0',
// since it is seen first.
dataset(4, 0) = info.MapString<double>("4", 4);
dataset(4, 1) = info.MapString<double>("3", 4);
dataset(4, 2) = info.MapString<double>("4", 4);
dataset(4, 3) = info.MapString<double>("1", 4);
dataset(4, 4) = info.MapString<double>("0", 4);
dataset(4, 5) = info.MapString<double>("2", 4);

// Print the dataset with mapped categories.
dataset.print("Dataset with mapped categories");

// Print the mappings for the third dimension.
std::cout << "Mappings for dimension 3: " << std::endl;
for (size_t i = 0; i < info.NumMappings(2); ++i)
{
  std::cout << " - \"" << info.UnmapString(i, 2) << "\" maps to " << i << "."
      << std::endl;
}

// Now `dataset` is ready for use with an mlpack algorithm that supports
// categorical data.

🔗 Image data

If the STB image library is available on the system (stb_image.h and stb_image_write.h must be available on the compiler’s include search path), then mlpack will define the MLPACK_HAS_STB macro, and support for loading individual images or sets of images will be available.

Supported formats for loading are jpg, png, tga, bmp, psd, gif, hdr, pic, and pnm.

Supported formats for saving are jpg, png, tga, bmp, and hdr.

When loading images, each image is represented as a flattened single column vector in a data matrix; each row of the resulting vector will correspond to a single pixel value in a single channel. An auxiliary data::ImageInfo class is used to store information about the images.

🔗 `data::ImageInfo`

The data::ImageInfo class contains the metadata of the images.

Constructors

info = data::ImageInfo()
- Create a data::ImageInfo object with no data.
- Use this constructor if you intend to populate the data::ImageInfo via a data::Load() call.
info = data::ImageInfo(width, height, channels)
- Create a data::ImageInfo object with the given image specifications.
- width and height are specified as pixels.

Accessing and modifying image metadata

info.Quality() = q will set the compression quality (e.g. for saving JPEGs) to q.
- q should take values between 0 and 100.
- The quality value is ignored unless calling data::Save() with info.
Calling info.Channels() = 1 before loading will cause images to be loaded in grayscale.
Metadata stored in the data::ImageInfo can be accessed with the following members:
- info.Width() returns the image width in pixels.
- info.Height() returns the image height in pixels.
- info.Channels() returns the number of color channels in the image.
- info.Quality() returns the compression quality that will be used to save images (between 0 and 100).

🔗 Loading images

With a data::ImageInfo object, image data can be loaded or saved, handling either one or multiple images at a time:

data::Load(filename, matrix, info, fatal=false)
- Load a single image from filename into matrix.
  - Format is chosen by extension (e.g. image.png will load as PNG).
- matrix will have one column representing the image as a flattened vector.
- info will be populated with information from the image in filename.
- If fatal is true, a std::runtime_error will be thrown upon load failure.
- Returns a bool indicating the success of the operation.

data::Load(files, matrix, info, fatal=false)
- Load multiple images from files into matrix.
  - files is of type std::vector<std::string> and should contain the list of images to be loaded.
  - matrix will have files.size() columns, each representing the corresponding image as a flattened vector.
- info will be populated with information from the images in files.
- If fatal is true, a std::runtime_error will be thrown if any files fail to load.
- Returns a bool indicating the success of the operation.

data::Save(filename, matrix, info, fatal=false)
- Save a single image from matrix into the file filename.
  - Format is chosen by extension (e.g. image.png will save as PNG).
- matrix is expected to have only one column representing the image as a flattened vector.
- If fatal is true, a std::runtime_error will be thrown in the event of save failure.
- Returns a bool indicating the success of the operation.

data::Save(files, matrix, info, fatal=false)
- Save multiple images from matrix into files.
  - files is of type std::vector<std::string> and should contain the list of files to save to.
  - The format of each file is chosen by extension (e.g. image.png will save as PNG); it is allowed for filenames in files to have different extensions.
- matrix is expected to have files.size() columns representing images as flattened vectors.
- If fatal is true, a std::runtime_error will be thrown if any images fail to save.
- Returns a bool indicating the success of the operation.

Images are flattened along rows, with channel values interleaved, starting from the top left. Thus, the value of the pixel at position (x, y) in channel c will be contained in element/row y * (width * channels) + x * (channels) + c of the flattened vector.

Pixels take values between 0 and 255.

Example of loading and saving a single image:

// See https://www.mlpack.org/static/img/numfocus-logo.png.
mlpack::data::ImageInfo info;
arma::mat matrix;
mlpack::data::Load("numfocus-logo.png", matrix, info, true);

// `matrix` should now contain one column.

// Print information about the image.
std::cout << "Information about the image in 'numfocus-logo.png': "
    << std::endl;
std::cout << " - " << info.Width() << " pixels in width." << std::endl;
std::cout << " - " << info.Height() << " pixels in height." << std::endl;
std::cout << " - " << info.Channels() << " color channels." << std::endl;

std::cout << "Value at pixel (x=3, y=4) in the first channel: ";
const size_t index = (4 * info.Width() * info.Channels()) +
    (3 * info.Channels());
std::cout << matrix[index] << "." << std::endl;

// Increment each pixel value, but make sure they are still within the bounds.
matrix += 1;
matrix = arma::clamp(matrix, 0, 255);

mlpack::data::Save("numfocus-logo-mod.png", matrix, info);

Example of loading and saving multiple images:

// Load some favicons from websites associated with mlpack.
std::vector<std::string> images;
// See the following files:
// - https://datasets.mlpack.org/images/mlpack-favicon.png
// - https://datasets.mlpack.org/images/ensmallen-favicon.png
// - https://datasets.mlpack.org/images/armadillo-favicon.png
// - https://datasets.mlpack.org/images/bandicoot-favicon.png
images.push_back("mlpack-favicon.png");
images.push_back("ensmallen-favicon.png");
images.push_back("armadillo-favicon.png");
images.push_back("bandicoot-favicon.png");

mlpack::data::ImageInfo info;
info.Channels() = 1; // Force loading in grayscale.

arma::mat matrix;
mlpack::data::Load(images, matrix, info, true);

// Print information about what we loaded.
std::cout << "Loaded " << matrix.n_cols << " images.  Images are of size "
    << info.Width() << " x " << info.Height() << " with " << info.Channels()
    << " color channel." << std::endl;

// Invert images.
matrix = (255.0 - matrix);

// Save as compressed JPEGs with low quality.
info.Quality() = 75;
std::vector<std::string> outImages;
outImages.push_back("mlpack-favicon-inv.jpeg");
outImages.push_back("ensmallen-favicon-inv.jpeg");
outImages.push_back("armadillo-favicon-inv.jpeg");
outImages.push_back("bandicoot-favicon-inv.jpeg");

mlpack::data::Save(outImages, matrix, info);

🔗 Resize images

It is possible to resize images in mlpack with the following function:

ResizeImages(images, info, newWidth, newHeight)
- images is a column-major matrix containing a set of images; each image is represented as a flattened vector in one column.
- info is a data::ImageInfo& containing details about the images in images, and will be modified to contain the new size of the images.
- newWidth and newHeight (of type size_t) are the desired new dimensions of the resized images.
- This function returns void and modifies info and images.
- NOTE: if the element type of images is not unsigned char or float (e.g. if image is not arma::Mat<unsigned char> or arma::fmat), the matrix will be temporarily converted during resizing; therefore, using unsigned char or float as the element type is the most efficient.
- This function expects all the images to have identical dimensions. If this is not the case, iteratively call ResizeImages() with a single image/column in images.

Example usage of the ResizeImages() function on a set of images with different dimensions:

// See https://datasets.mlpack.org/sheep.tar.bz2
arma::Mat<unsigned char> image;
mlpack::data::ImageInfo info;

// The images are located in our test/data directory. However, any image could
// be used instead.
std::vector<std::string> files =
    {"sheep_1.jpg", "sheep_2.jpg", "sheep_3.jpg", "sheep_4.jpg",
     "sheep_5.jpg", "sheep_6.jpg", "sheep_7.jpg", "sheep_8.jpg",
     "sheep_9.jpg"};

// The resized images will be saved locally. We are declaring the vector that
// contains the names of the resized images.
std::vector<std::string> reSheeps =
    {"re_sheep_1.jpg", "re_sheep_2.jpg", "re_sheep_3.jpg", "re_sheep_4.jpg",
     "re_sheep_5.jpg", "re_sheep_6.jpg", "re_sheep_7.jpg", "re_sheep_8.jpg",
     "re_sheep_9.jpg"};

// Load and Resize each one of them individually, because they do not have
// the same dimensions. The `info` will contain the dimension for each one.
for (size_t i = 0; i < files.size(); i++)
{
  mlpack::data::Load(files.at(i), image, info, false);
  mlpack::data::ResizeImages(image, info, 320, 320);
  mlpack::data::Save(reSheeps.at(i), image, info, false);
}

Example usage of ResizeImages() function on a set of images that have the same dimensions.

// All images have the same dimension, It would be possible to load all of
// them into one matrix

// See https://datasets.mlpack.org/sheep.tar.bz2
arma::Mat<unsigned char> images;
mlpack::data::ImageInfo info;

std::vector<std::string> reSheeps =
    {"re_sheep_1.jpg", "re_sheep_2.jpg", "re_sheep_3.jpg", "re_sheep_4.jpg",
     "re_sheep_5.jpg", "re_sheep_6.jpg", "re_sheep_7.jpg", "re_sheep_8.jpg",
     "re_sheep_9.jpg"};

mlpack::data::Load(reSheeps, images, info, false);

// Now let us resize all these images at once, to specific dimensions.
mlpack::data::ResizeImages(images, info, 160, 160);

// The resized images will be saved locally. We are declaring the vector that
// contains the names of the resized images.
std::vector<std::string> smSheeps =
    {"sm_sheep_1.jpg", "sm_sheep_2.jpg", "sm_sheep_3.jpg", "sm_sheep_4.jpg",
     "sm_sheep_5.jpg", "sm_sheep_6.jpg", "sm_sheep_7.jpg", "sm_sheep_8.jpg",
     "sm_sheep_9.jpg"};

mlpack::data::Save(smSheeps, images, info, false);

🔗 Resize and crop images

In addition to resizing images, mlpack also provides resize-and-crop functionality. This is useful when the desired aspect ratio of an image differs largely from the original image.

The resize-and-crop operation, given a target size outputWidth x outputHeight, first resizes the image while preserving the aspect ratio such that the width and height of the image both no smaller than outputWidth and outputHeight. Then, the image is cropped to have size outputWidth by outputHeight, keeping the center pixels only. This process is shown below.

Original image:

cat

Original image with target size of 220x220 pixels:

cat with rectangle overlaid

First step: resize while preserving aspect ratio:

scaled cat with rectangle overlaid

Second step: crop to desired final size:

cropped cat

ResizeCropImages(images, info, newWidth, newHeight)
- images is a column-major matrix containing a set of images; each image is represented as a flattened vector in one column.
- info is a data::ImageInfo& containing details about the images in images.
- images and info are modified in-place.
- newWidth and newHeight (of type size_t) are the desired new dimensions of the resized images.
  - If the output size is larger than the input image size, the images will be upscaled the minimum amount necessary before cropping.
  - If the aspect ratio is not changed from the input aspect ratio, no cropping is performed.
- NOTE: if the element type of images is not unsigned char or float (e.g. if image is not arma::Mat<unsigned char> or arma::fmat), the matrix will be temporarily converted during resizing; therefore, using unsigned char or float as the element type is the most efficient.
- This function expects all the images to have identical dimensions. If this is not the case, iteratively call ResizeCropImages() with a single image/column in images.

Example usage of the ResizeCropImages() function on a set of images with different dimensions:

// See https://datasets.mlpack.org/sheep.tar.bz2.
arma::Mat<unsigned char> image;
mlpack::data::ImageInfo info;

// The images are located in our test/data directory. However, any image could
// be used instead.
std::vector<std::string> files =
    {"sheep_1.jpg", "sheep_2.jpg", "sheep_3.jpg", "sheep_4.jpg",
     "sheep_5.jpg", "sheep_6.jpg", "sheep_7.jpg", "sheep_8.jpg",
     "sheep_9.jpg"};

// The resized images will be saved locally. We are declaring the vector that
// contains the names of the resized and cropped images.
std::vector<std::string> cropSheeps =
    {"crop_sheep_1.jpg", "crop_sheep_2.jpg", "crop_sheep_3.jpg",
     "crop_sheep_4.jpg", "crop_sheep_5.jpg", "crop_sheep_6.jpg",
     "crop_sheep_7.jpg", "crop_sheep_8.jpg", "crop_sheep_9.jpg"};

// Load and resize-and-crop each image individually, because they do not have
// the same dimensions. The `info` will contain the dimension for each one.
for (size_t i = 0; i < files.size(); i++)
{
  mlpack::data::Load(files.at(i), image, info, false);
  mlpack::data::ResizeCropImages(image, info, 320, 320);
  mlpack::data::Save(cropSheeps.at(i), image, info, false);
  std::cout << "Resized and cropped " << files.at(i) << " to "
      << cropSheeps.at(i) << " with output size 320x320." << std::endl;
}

🔗 mlpack objects

All mlpack objects can be saved with data::Save() and loaded with data::Load(). Serialization is performed using the cereal serialization toolkit. Each object must be given a logical name.

data::Load(filename, name, object, fatal=false, format=data::format::autodetect)
data::Save(filename, name, object, fatal=false, format=data::format::autodetect)
- Load/save object to/from filename with the logical name name.
- If fatal is true, a std::runtime_error will be thrown in the event of load or save failure.
- The format is autodetected based on extension (.bin, .json, or .xml), but can be manually specified:
  - data::format::binary: binary blob (smallest and fastest). No checks; assumes all data is correct.
  - data::format::json: JSON.
  - data::format::xml: XML (largest and slowest).
- For JSON and XML types, when loading, name must match the name used to save the object.
- Returns a bool indicating the success of the operation.

Note: when loading an object that was saved as a binary blob, the C++ type of the object must be exactly the same (including template parameters) as the type used to save the object. If not, undefined behavior will occur—most likely a crash.

Simple example: create a math::Range object, then save and load it.

mlpack::math::Range r(3.0, 6.0);

// Save the Range to 'range.bin', using the name "range".
mlpack::data::Save("range.bin", "range", r, true);

// Load the range into a new object.
mlpack::math::Range r2;
mlpack::data::Load("range.bin", "range", r2, true);

std::cout << "Loaded range: [" << r2.Lo() << ", " << r2.Hi() << "]."
    << std::endl;

// Modify and save the range as JSON.
r2.Lo() = 4.0;
mlpack::data::Save("range.json", "range", r2, true);

// Now 'range.json' will contain the following:
//
// {
//     "range": {
//         "cereal_class_version": 0,
//         "hi": 6.0,
//         "lo": 4.0
//     }
// }

🔗 Formats

mlpack’s data::Load() and data::Save() functions support a variety of different formats in different contexts.

Numeric data

By default, load/save format is autodetected, but can be manually specified with the format parameter using one of the options below:

FileType::AutoDetect (default): auto-detects the format as one of the formats below using the extension of the filename and inspecting the file contents.
FileType::CSVASCII (autodetect extensions .csv, .tsv): CSV format with no header. If loading a sparse matrix and the CSV has three columns, the data is interpreted as a coordinate list.
FileType::RawASCII (autodetect extensions .csv, .txt): space-separated values or tab-separated values (TSV) with no header.
FileType::ArmaASCII (autodetect extension .txt): space-separated values as saved by Armadillo with the arma_ascii format.
FileType::CoordASCII (autodetect extensions .txt, .tsv; must be loading a sparse matrix type): coordinate list format for sparse data (see coord_ascii).
FileType::ArmaBinary (autodetect extension .bin): Armadillo’s efficient binary matrix format (arma_binary).
FileType::HDF5Binary (autodetect extensions .h5, .hdf5, .hdf, .he5): HDF5 binary format; only available if Armadillo is configured with HDF5 support.
FileType::RawBinary (autodetect extension .bin): packed binary data with no header and no size information; data will be loaded as a single column vector (not recommended).
FileType::PGMBinary (autodetect extension .pgm): PGM image format

Notes:

ASCII formats (CSVASCII, RawASCII, ArmaASCII) are human-readable but large; to reduce dataset size, consider a binary format such as ArmaBinary or HDF5Binary.
Sparse data (arma::sp_mat, arma::sp_fmat, etc.) should be saved in a binary format (ArmaBinary or HDF5Binary) or as a coordinate list (CoordASCII).

Mixed categorical data

The format of mixed categorical data is detected automatically based on the file extension and inspecting the file contents:

.csv, .txt, or .tsv indicates CSV/TSV/ASCII format
.arff indicates ARFF

Image data

The format of images are detected automatically based on the file extension.

The following formats are supported for loading: .jpg, .jpeg, .png, .tga, .bmp, .psd, .gif, .hdr, .pic, .pnm
The following formats are supported for saving: .jpg, .png, .tga, .bmp, .hdr

mlpack objects

By default, load/save format for mlpack objects is autodetected, but can be manually specified with the format parameter using one of the options below:

format::autodetect (default): auto-detects the format as one of the formats below using the extension of the filename
format::json (autodetect extension .json)
format::xml (autodetect extension .xml)
format::binary (autodetect extension .bin)

Notes:

format::json (.json) and format::xml (.xml) produce human-readable files, but they may be quite large.
format::binary (.bin) is recommended for the sake of size; objects in binary format may be an order of magnitude or more smaller than JSON!

Data loading and I/O

🔗 Numeric data

🔗 Mixed categorical data

🔗 data::DatasetInfo

Constructors

Accessing and setting properties

Map to and from numeric values

🔗 Loading categorical data

🔗 Image data

🔗 data::ImageInfo

Constructors

Accessing and modifying image metadata

🔗 Loading images

🔗 Resize images

🔗 Resize and crop images

🔗 mlpack objects

🔗 Formats

Numeric data

Mixed categorical data

Image data

mlpack objects

🔗 `data::DatasetInfo`

🔗 `data::ImageInfo`