mlpack.mean_shift

mean_shift(...)Mean Shift Clustering

>>> from mlpack import mean_shift

This program performs mean shift clustering on the given dataset, storing the learned cluster assignments either as a column of labels in the input dataset or separately.

The input dataset should be specified with the 'input' parameter, and the radius used for search can be specified with the 'radius' parameter. The maximum number of iterations before algorithm termination is controlled with the 'max_iterations' parameter.

The output labels may be saved with the 'output' output parameter and the centroids of each cluster may be saved with the 'centroid' output parameter.

For example, to run mean shift clustering on the dataset 'data' and store the centroids to 'centroids', the following command may be used:

>>> output = mean_shift(input=data)

>>> centroids = output['centroid']

## input options

- input (numpy matrix or arraylike, float dtype): [required] Input dataset to perform clustering on.
- copy_all_inputs (bool): If specified, all input parameters will be deep copied before the method is run. This is useful for debugging problems where the input parameters are being modified by the algorithm, but can slow down the code.
- force_convergence (bool): If specified, the mean shift algorithm will continue running regardless of max_iterations until the clusters converge.
- in_place (bool): If specified, a column containing the learned cluster assignments will be added to the input dataset file. In this case, --output_file is overridden. (Do not use with Python.)
- labels_only (bool): If specified, only the output labels will be written to the file specified by --output_file.
- max_iterations (int): Maximum number of iterations before mean shift terminates. Default value 1000.
- radius (float): If the distance between two centroids is less than the given radius, one will be removed. A radius of 0 or less means an estimate will be calculated and used for the radius. Default value 0.
- verbose (bool): Display informational messages and the full list of parameters and timers at the end of execution.

## output options

The return value from the binding is a dict containing the following elements:

- centroid (numpy matrix, float dtype): If specified, the centroids of each cluster will be written to the given matrix.
- output (numpy matrix, float dtype): Matrix to write output labels or labeled data to.