LEARNING STOCHASTIC FEEDFORWARD NETWORKS Radford M. Neal Department of Computer Science University of Toronto November 1990 Connectionist learning procedures are presented for "sigmoid" and "noisy-OR" varieties of stochastic feedforward network. These networks are in the same class as the "belief networks" used in expert systems. They represent a probability distribution over a set of visible variables using hidden variables to express correlations. Conditional probability distributions can be exhibited by stochastic simulation for use in tasks such as classification. Learning from empirical data is done via a gradient ascent method analogous to that used in Boltzmann machines, but due to the feedforward nature of the connections, the negative phase of Boltzmann machine learning is unnecessary. Experimental results show that, as a result, learning in a sigmoid feedforward network can be faster than in a Boltzmann machine. These networks have other advantages over Boltzmann machines in pattern classification and decision making applications, and provide a link between work on connectionist learning and work on the representation of expert knowledge.