This directory contains CSRI Technical Report #315 entitled Fusion of Loops for Parallelism and Locality by Naraig Manjikian and Tarek Abdelrahman {nmanjiki,tsa}@eecg.toronto.edu If you have the UNIX uncompress program, get the 315.ps.Z file. Remember to transfer the file in binary mode. Uncompress it, and print it on a PostScript printer. If you do not have uncompress, get the 315.ps file in ascii mode, and print it on a PostScript printer. Abstract Loop fusion improves data locality and reduces synchronization in data-parallel applications. However, loop fusion is not always legal. Even when legal, fusion may introduce loop-carried dependences which reduce parallelism. In addition, performance losses result from cache conflicts in fused loops. We present new, systematic techniques which: (1) allow fusion of loop nests in the presence of fusion-preventing dependences, (2) allow parallel execution of fused loops with minimal synchronization, and (3) eliminate cache conflicts in fused loops. We evaluate our techniques on a 56-processor KSR2 multiprocessor, and show performance improvements of up to 20% for representative loop nest sequences. The results also indicate a performance tradeoff as more processors are used, suggesting a careful evaluation of the profitability of fusion.