WRL Technical Report 89/8
	 A Unified Vector/Scalar Floating-Point Architecture
	          Norman P. Jouppi, Jonathan Bertoni, 
                          and David W. Wall
			       July 1989

In this paper we present a unified approach to vector and scalar computation,
using a single register file for both scalar operands and vector elements.
The goal of this architecture is to yield improved scalar performance while
broadening the range of vectorizable applications.  For example, reduction
operations and recurrences can be expressed in vector form in this
architecture.  This approach results in greater overall performance for most
applications than does the approach of emphasizing peak vector performance.
The hardware required to support the enhanced vector capability is 
insignificant, but allows the execution of two operations per cycle for 
vectorized code.  Moreover, the size of the unified vector/scalar file 
required for peak performance is an order of magnitude smaller than traditional
vector register files, allowing efficient on-chip VLSI implementation.  The 
results of simulations of the Livermore Loops and Linpack using this 
architecture are presented.