mlpack
git-master
|
AdaGrad is a modified version of stochastic gradient descent which performs larger updates for more sparse parameters and smaller updates for less sparse parameters. More...
Public Member Functions | |
AdaGrad (const double stepSize=0.01, const size_t batchSize=32, const double epsilon=1e-8, const size_t maxIterations=100000, const double tolerance=1e-5, const bool shuffle=true) | |
Construct the AdaGrad optimizer with the given function and parameters. More... | |
size_t | BatchSize () const |
Get the batch size. More... | |
size_t & | BatchSize () |
Modify the batch size. More... | |
double | Epsilon () const |
Get the value used to initialise the squared gradient parameter. More... | |
double & | Epsilon () |
Modify the value used to initialise the squared gradient parameter. More... | |
size_t | MaxIterations () const |
Get the maximum number of iterations (0 indicates no limit). More... | |
size_t & | MaxIterations () |
Modify the maximum number of iterations (0 indicates no limit). More... | |
template < typename DecomposableFunctionType > | |
double | Optimize (DecomposableFunctionType &function, arma::mat &iterate) |
Optimize the given function using AdaGrad. More... | |
bool | Shuffle () const |
Get whether or not the individual functions are shuffled. More... | |
bool & | Shuffle () |
Modify whether or not the individual functions are shuffled. More... | |
double | StepSize () const |
Get the step size. More... | |
double & | StepSize () |
Modify the step size. More... | |
double | Tolerance () const |
Get the tolerance for termination. More... | |
double & | Tolerance () |
Modify the tolerance for termination. More... | |
Detailed Description
AdaGrad is a modified version of stochastic gradient descent which performs larger updates for more sparse parameters and smaller updates for less sparse parameters.
For more information, see the following.
For AdaGrad to work, a DecomposableFunctionTypes template parameter is required. This class must implement the following function:
size_t NumFunctions(); double Evaluate(const arma::mat& coordinates, const size_t i, const size_t batchSize); void Gradient(const arma::mat& coordinates, const size_t i, arma::mat& gradient, const size_t batchSize);
NumFunctions() should return the number of functions ( ), and in the other two functions, the parameter i refers to which individual function (or gradient) is being evaluated. So, for the case of a data-dependent function, such as NCA (see mlpack::nca::NCA), NumFunctions() should return the number of points in the dataset, and Evaluate(coordinates, 0) will evaluate the objective function on the first point in the dataset (presumably, the dataset is held internally in the DecomposableFunctionType).
Definition at line 64 of file ada_grad.hpp.
Constructor & Destructor Documentation
◆ AdaGrad()
AdaGrad | ( | const double | stepSize = 0.01 , |
const size_t | batchSize = 32 , |
||
const double | epsilon = 1e-8 , |
||
const size_t | maxIterations = 100000 , |
||
const double | tolerance = 1e-5 , |
||
const bool | shuffle = true |
||
) |
Construct the AdaGrad optimizer with the given function and parameters.
The defaults here are not necessarily good for the given problem, so it is suggested that the values used be tailored to the task at hand. The maximum number of iterations refers to the maximum number of points that are processed (i.e., one iteration equals one point; one iteration does not equal one pass over the dataset).
- Parameters
-
stepSize Step size for each iteration. batchSize Number of points to process in one step. epsilon Value used to initialise the squared gradient parameter. maxIterations Maximum number of iterations allowed (0 means no limit). tolerance Maximum absolute tolerance to terminate algorithm. shuffle If true, the function order is shuffled; otherwise, each function is visited in linear order.
Member Function Documentation
◆ BatchSize() [1/2]
|
inline |
Get the batch size.
Definition at line 113 of file ada_grad.hpp.
◆ BatchSize() [2/2]
|
inline |
Modify the batch size.
Definition at line 115 of file ada_grad.hpp.
◆ Epsilon() [1/2]
|
inline |
Get the value used to initialise the squared gradient parameter.
Definition at line 118 of file ada_grad.hpp.
◆ Epsilon() [2/2]
|
inline |
Modify the value used to initialise the squared gradient parameter.
Definition at line 120 of file ada_grad.hpp.
◆ MaxIterations() [1/2]
|
inline |
Get the maximum number of iterations (0 indicates no limit).
Definition at line 123 of file ada_grad.hpp.
◆ MaxIterations() [2/2]
|
inline |
Modify the maximum number of iterations (0 indicates no limit).
Definition at line 125 of file ada_grad.hpp.
◆ Optimize()
|
inline |
Optimize the given function using AdaGrad.
The given starting point will be modified to store the finishing point of the algorithm, and the final objective value is returned.
- Template Parameters
-
DecomposableFunctionType Type of the function to optimize.
- Parameters
-
function Function to optimize. iterate Starting point (will be modified).
- Returns
- Objective value of the final point.
Definition at line 102 of file ada_grad.hpp.
◆ Shuffle() [1/2]
|
inline |
Get whether or not the individual functions are shuffled.
Definition at line 133 of file ada_grad.hpp.
◆ Shuffle() [2/2]
|
inline |
Modify whether or not the individual functions are shuffled.
Definition at line 135 of file ada_grad.hpp.
◆ StepSize() [1/2]
|
inline |
Get the step size.
Definition at line 108 of file ada_grad.hpp.
◆ StepSize() [2/2]
|
inline |
Modify the step size.
Definition at line 110 of file ada_grad.hpp.
◆ Tolerance() [1/2]
|
inline |
Get the tolerance for termination.
Definition at line 128 of file ada_grad.hpp.
◆ Tolerance() [2/2]
|
inline |
Modify the tolerance for termination.
Definition at line 130 of file ada_grad.hpp.
The documentation for this class was generated from the following file:
- src/mlpack/core/optimizers/ada_grad/ada_grad.hpp
Generated by
