Title: Multiprogrammed Parallel Application Scheduling in NUMA Multiprocessors Authors: Timothy B. Brecht Where: Ph.D. Dissertation - CSRI Technical Report CSRI-303 Abstract: The invention, acceptance, and proliferation of multiprocessors are primarily a result of the quest to increase computer system performance. The most promising features of multiprocessors are their potential to solve problems faster than previously possible and to solve larger problems than previously possible. Large-scale multiprocessors offer the additional advantage of being able to execute multiple parallel applications simultaneously. The execution time of a parallel application is directly related to the number of processors it is allocated and, in shared-memory non-uniform memory access time (NUMA) multiprocessors, which processors it is allocated. As a result, efficient and effective scheduling becomes critical to overall system performance. In fact, it is likely to be a contributing factor in ultimately determining the success or failure of shared-memory NUMA multiprocessors. The subjects of this dissertation are the problems of processor allocation and application placement. The processor allocation problem involves determining the number of processors to allocate to each of several simultaneously executing parallel applications and possibly dynamically adjusting those allocations to improve overall system performance. The performance metric used is mean response time. We show that by differentiating between applications based on the amount of remaining work they have to execute, performance can be improved significantly. Then we propose techniques for estimating an application's expected remaining work along with policies for using these estimates to make improved processor allocation decisions. An experimental evaluation demonstrates the promise of this approach. The placement problem involves determining which of the many processors to assign to each application. Using experiments conducted on a representative system, we demonstrate that in large-scale NUMA multiprocessors the execution time of parallel applications is significantly affected by the placement of the application. This motivates the need for new techniques designed explicitly for NUMA multiprocessors. We introduce such a technique, called processor pool-based scheduling, that is designed to localize the execution of parallel applications within a NUMA architecture and to isolate different parallel applications from each other. An experimental evaluation of this scheduling method shows that it can be used to significantly reduce mean response time over methods that do not consider the placement of parallel applications.