High Performance Interconnect System Design for Future Chip Multiprocessors

Date

2013-05-02

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Chip Multi-Processor (CMP) architectures have become mainstream for designing processors. With a large number of cores, Network-On-Chip (NOC) provides a scalable communication method for CMP architectures. NOC must be carefully designed to meet constraints of power and area, and provide ultra low latencies and high throughput. In this research, we explore different techniques to design high performance NOC.

First, existing NOCs mostly use Dimension Order Routing (DOR) to determine the route taken by a packet in unicast traffic. However, with the development of diverse applications in CMPs, one-to-many (multicast) and one-to-all (broadcast) traffic are becoming more common. Current unicast routing cannot support multi-cast and broadcast traffic efficiently. We propose Recursive Partitioning Multicast (RPM) routing and a detailed multicast wormhole router design for NOCs. RPM allows routers to select intermediate replication nodes based on the global distribution of destination nodes. This provides more path diversities, thus achieves more bandwidth-efficiency and finally improves the performance of the whole network.

Second, as feature size is shrinking, wires are becoming abundant resources available in NOC. Since NOC can benefit from high wire density due to no limits on the number of pins and faster signaling rates, it is very critical in the NOC router design to find a way that fully utilizes the wire resources to provide high performance. We propose an Adaptive Physical Channel Regulator (APCR) for NOC routers to exploit huge wiring resources. The flit size in an APCR router is less than the physical channel width (phit size) to provide finer granularity flow control. An APCR router allows flits from different packets or flows to share the same physical channel in a single cycle. The three regulation schemes (Monopolizing, Fair-sharing and Channel-stealing) intelligently allocate the output channel resources considering not only the availability of physical channels but the occupancy of input buffers. In an APCR router, each Virtual Channel can forward a dynamic number of flits every cycle depending on the run-time network status.

Third, nanophotonics has been proposed to design low latency and high band- width NOC for future CMPs. Recent nanophotonic NOC designs adopt the token- based arbitration coupled with credit-based flow control, which leads to low band- width utilization. We propose two handshake schemes for nanophotonic interconnects in CMPs, Global Handshake (GHS) and Distributed Handshake (DHS), which get rid of the traditional credit-based flow control, reduce the average token waiting time, and finally improve the network throughput. Furthermore, we enhance the basic handshake schemes with setaside buffer and circulation techniques to overcome the Head-Of-Line (HOL) blocking.

Description

Keywords

Network-On-Chip, Chip Multiprocessors

Citation