-Warning Long Post-
Intel’s roadmap actually isn’t too bad. While the delay of the integrated memory controller is a set back, it probably isn't as catastrophic as it appears.
Intel has been taking a lot of flak lately on their new Paxville DP. In this case, it is deserved. They decided to place 2 Prescott 620s together. The crazy heat production is directly due to the presence of not only HT in each processor but also an extra 1MB of L2 cache. Intel probably felt that the 1MB of extra cache per core was more worthwhile in a server environment than a 400MHz increase in clock rate, which is why they didn’t just use a 840EE. In the end, the 90nm process simply couldn’t handle dual core HT enabled processors with 2MB of cache per core. The lower performance compared to Opteron is due to the low clock speed of 2.8GHz and the bottleneck of 4 cores sharing a 800MHz FSB.
These problems will be greatly reduced once Dempsey and Bensley arrive. Dempsey will probably be closely related to Presler, meaning speeds of up to 3.46GHz HT enabled with 2MB of L2 cache per core. Higher clocked speeds are likely possible as a 3.4GHz 950 was shown to fit within the thermal and power envelop of a 2.8GHz 820. The higher clock speed will help, but the main benefit is the 1066MHz FSB. The 33.3% increase in bandwidth will satisfy Core-to-Core cache transfers while opening up more throughput to the RAM. Even more important is the addition of individual 1066MHz FSB pipes like what AMD has to ensure the processors don’t compete. In addition, the RAM speed has increased from 400MHz to 533MHz and is now quad-channelled. This means that total FSB bandwidth has nearly tripled from the 6.4GB/s in Paxville between 4 cores, to 17GB/s. Memory bandwidth has likewise tripled from 6.4GB/s to 17GB/s. Even a dual processor Opteron system only has 12.8GB/s of memory bandwidth available total. Dempsey and Bensley should certainly make Intel highly competitive with AMD.
Now to address the 4-way server market. While an integrated memory controller would provide better memory bandwidth scaling with additional processors, Intel’s current FSB architecture could easily be expanded to provide much of what’s required. Currently Intel’s Xeon MPs use a 667MHz FSB. Intel is already working on a 1333MHz FSB for Woodcrest, and the application of such a bus would double the available bandwidth. Of course, on the motherboard side, each processor would have an independent FSB to reduce congestion. Memory bandwidth would likewise see an increase from the current 400MHz to 667MHz in a quad-channel configuration. These improvements are easily made and will keep Intel competitive in the near-term.
One of the major improvements with the use of an integrated memory controller is the reduction in latency. The high latencies on Intel’s current systems is partially due to the memory running asynchronously with the FSB. This is corrected in Bensley where 533MHz RAM is matched with a 1066MHz FSB. By working synchronously, some of the latency issues will be reduced. Similary, Xeon MPs working with a 1333MHz FSB will run synchronously with 667MHz RAM. In addition, an advantage that Intel has over AMD is that they design their own chipsets. If they spent the effort, they could easily streamline the CPU-Northbridge-RAM interconnects to reduce latency.
All these are just simple improvements in the buses that will help improve Intel’s performance. Intel’s next-generation architecture isn’t even mentioned, but Conroe, Meron and Woodcrest are certainly something to look forward too. All in all, the delay of an integrated memory controller isn’t a catastrophe to Intel’s roadmaps.