PRIORS FOR INFINITE NETWORKS
Radford M. Neal
Department of Computer Science
University of Toronto
1 March 1994
Bayesian inference begins with a prior distribution for model
parameters that is meant to capture prior beliefs about the
relationship being modeled. For multilayer perceptron networks,
where the parameters are the connection weights, the prior lacks
any direct meaning - what matters is the prior over functions
computed by the network that is implied by this prior over
weights. In this paper, I show that priors over weights can be
defined in such a way that the corresponding priors over
functions reach reasonable limits as the number of hidden units
in the network goes to infinity. When using such priors, there
is thus no need to limit the size of the network in order to
avoid "overfitting". The infinite network limit also provides
insight into the properties of different priors. A Gaussian prior
for hidden-to-output weights results in a Gaussian process prior
for functions, which can be smooth, Brownian, or fractional
Brownian, depending on the hidden unit activation function and the
prior for input-to-hidden weights. Quite different effects can be
obtained using priors based on non-Gaussian stable distributions.
In networks with more than one hidden layer, a combination of
Gaussian and non-Gaussian priors appears most interesting.