|
Abstract:
|
Technology trends such as growing wire delays , power consumption limits , and diminishing clock rate improvements , present conventional instruction set architectures such as RISC , CISC , and VLIW with difficult challenges . To show continued performance growth , future microprocessors must exploit concurrency power efficiently . An important question for any future system is the division of responsibilities between programmer , compiler , and hardware to discover and exploit concurrency .
In this research we develop the first compiler for an Explicit Data Graph Execution (EDGE ) architecture and show how to solve the new challenge of compiling to a block -based architecture . In EDGE architectures , the compiler is responsible for partitioning the program into a sequence of structured blocks that logically execute atomically . The EDGE ISA defines the structure of , and the restrictions on , these blocks . The TRIPS prototype processor is an EDGE architecture that employs four restrictions on blocks intended to strike a balance between software and hardware complexity . They are : (1 ) fixed block sizes (maximum of 128 instructions ) , (2 ) restricted number of loads and stores (no more than 32 may issue per block ) , (3 ) restricted register accesses (no more than eight reads and eight writes to each of four banks per block ) , and (4 ) constant number of block outputs (each block must always generate a constant number of register writes and stores , plus exactly one branch ) .
The challenges addressed in this thesis are twofold . First , we develop the algorithms and internal representations necessary to support the new structural constraints imposed by the block -based EDGE execution model . This first step provides correct execution and demonstrates the feasibility of EDGE compilers .
Next , we show how to optimize blocks using a dataflow predication model and provide results showing how the compiler is meeting this challenge on the SPEC2000 benchmarks . Using basic blocks as the baseline performance , we show that optimizations utilizing the dataflow predication model achieve up to 64 % speedup on SPEC2000 with an average speedup of 31 % . |