Interconnect Problems – Think Globally
While transistor switching performance has continued to improve by roughly a third each fabrication process generation, the wires that connect them throughout the chip, the metal interconnects have comparatively deteriorated in performance]. Interconnect can also draw up to a third of the power utilization of a modern microprocessor. Indeed, the energy required to drive an operand across a chip’s wires can dwarf the energy needed to operate on it by the computation logic]. There have been isolated one-time improvements to interconnects, such as slightly better insulating materials between interconnect layers to reduce parasitic capacitance. The switch from aluminum to copper interconnects reduced resistance, which similarly increases the performance of interconnects. However, the future of wire performance is clear and getting worse with each process generation.
Latency of on-chip wires is generally a product of their resistance and capacitance, the RC delay, which is near a factor of the speed of light. A wire’s RC propagation delay is quadratic in proportion to its length (i.e. a wire that is twice is long might have an RC delay 4x larger, or more). As process feature sizes shrink, the capacitance of shrunken wires decreases marginally. However, the cross-section of the wire is cut in half, which doubles the resistance, effectively doubling the propagation delay. Functional unit blocks will also shrink, which reduces the length of the local intra-block interconnect (i.e. wires between different stages of a multiplier). This tends to mitigate latency increases, at least for the local wires. Yet constant wire latency in the presence of enhanced transistor switching speeds effectively increases interconnect latency, in relative terms. This comparative increase is tolerable for intra-block wires as they are very short, contributing little latency to the overall cycle time of the chip. It is the inter-block and especially upper level global on-chip interconnect where the majority of the increasing delay is encountered. Assuming a constant microprocessor die size between process generations, the latency of the global wires that have to travel the length of the chip could double with each shrink. This would triple the relative difference between global wire and transistor performance every process generation, reducing the chip area a global signal can travel per narrowing clock cycle..
..
A time honored solution to alleviate this problem is inserting buffers and flip-flops to partition a long wire into segments, boosting the signal. Since wire delay is a quadratic function of the length of a wire, segmenting a wire into two equal sub-segments halves the total wire latency, although the buffer itself introduces a small delay. However, buffers and flip-flops are not free, as they consume additional power. The number of buffers needed to ameliorate the interconnect-transistor disparity would grow exponentially over different process generations, making it unsuitable as a long term solution.