[x, options, flog, pointlog] = graddesc(f, x, options, gradf)
uses
batch gradient descent to find a local minimum of the function
f(x)
whose gradient is given by gradf(x)
. A log of the function values
after each cycle is (optionally) returned in errlog
, and a log
of the points visited is (optionally) returned in pointlog
.
Note that x
is a row vector
and f
returns a scalar value.
The point at which f
has a local minimum
is returned as x
. The function value at that point is returned
in options(8)
.
graddesc(f, x, options, gradf, p1, p2, ...)
allows
additional arguments to be passed to f()
and gradf()
.
The optional parameters have the following interpretations.
options(1)
is set to 1 to display error values; also logs error
values in the return argument errlog
, and the points visited
in the return argument pointslog
. If options(1)
is set to 0,
then only warning messages are displayed. If options(1)
is -1,
then nothing is displayed.
options(2)
is the absolute precision required for the value
of x
at the solution. If the absolute difference between
the values of x
between two successive steps is less than
options(2)
, then this condition is satisfied.
options(3)
is a measure of the precision required of the objective
function at the solution. If the absolute difference between the
objective function values between two successive steps is less than
options(3)
, then this condition is satisfied.
Both this and the previous condition must be
satisfied for termination.
options(7)
determines the line minimisation method used. If it
is set to 1 then a line minimiser is used (in the direction of the negative
gradient). If it is 0 (the default), then each parameter update
is a fixed multiple (the learning rate)
of the negative gradient added to a fixed multiple (the momentum) of
the previous parameter update.
options(9)
should be set to 1 to check the user defined gradient
function gradf
with gradchek
. This is carried out at
the initial parameter vector x
.
options(10)
returns the total number of function evaluations (including
those in any line searches).
options(11)
returns the total number of gradient evaluations.
options(14)
is the maximum number of iterations; default 100.
options(15)
is the precision in parameter space of the line search;
default foptions(2)
.
options(17)
is the momentum; default 0.5. It should be scaled by the
inverse of the number of data points.
options(18)
is the learning rate; default 0.01. It should be
scaled by the inverse of the number of data points.
options = zeros(1, 18); options(17) = 0.1/size(x, 1); net = netopt(net, options, x, t, 'graddesc');Note how the learning rate is scaled by the number of data points.
conjgrad
, linemin
, olgd
, minbrack
, quasinew
, scg
Copyright (c) Ian T Nabney (1996-9)