PRIORS FOR INFINITE NETWORKS

                        Radford M. Neal
                 Department of Computer Science 
                      University of Toronto 

                          1 March 1994

Bayesian inference begins with a prior distribution for model
parameters that is meant to capture prior beliefs about the
relationship being modeled.  For multilayer perceptron networks, 
where the parameters are the connection weights, the prior lacks 
any direct meaning - what matters is the prior over functions 
computed by the network that is implied by this prior over 
weights.  In this paper, I show that priors over weights can be 
defined in such a way that the corresponding priors over 
functions reach reasonable limits as the number of hidden units 
in the network goes to infinity.  When using such priors, there 
is thus no need to limit the size of the network in order to 
avoid "overfitting".  The infinite network limit also provides 
insight into the properties of different priors.  A Gaussian prior 
for hidden-to-output weights results in a Gaussian process prior 
for functions, which can be smooth, Brownian, or fractional 
Brownian, depending on the hidden unit activation function and the 
prior for input-to-hidden weights.  Quite different effects can be 
obtained using priors based on non-Gaussian stable distributions.  
In networks with more than one hidden layer, a combination of 
Gaussian and non-Gaussian priors appears most interesting.