Sadly, that article was posted at 6:53PM so I just missed it by 11 minutes.
I agree that an integrated memory controller is preferable, but Intel can still remain competitive despite the delay. By H2 2006, all Intel's processors will have shared L2 caches in addition to direct L1-L1 interconnects. This will eliminate the need to go through the FSB for cache transfers, freeing up bandwidth for other tasks. The use of shared L2 caches is superior to AMD's current implementation of a Crossbar as no L2-L2 transfer needs to take place therefore eliminating any bandwidth or latency issues that exist despite being within the CPU.
The use of a shared L2 cache also means that data does not need to be repeated. This is one aspect where Intel's architecture is superior to AMD's. While AMD only uses 4-way associativity in their L2 cache, Intel uses 16-way. This means that more of the cache can be accessed at a time. Therefore, information only needs to appear once in a larger L2 cache and can be accessed by both processors instead of being repeated twice in two smaller caches. While this obviously reduces latency as mentioned before, the key thing is it makes a 4MB L2 cache store more information than a 2x2MB cache with Crossbar does by eliminating duplicates. Making L2 cache space go farther in holding data helps Intel reduce constraint of FSB bandwidth as L2 cache hits increase, thereby decreasing the need to access the RAM.
The elimination of the need to use the FSB for cache-to-cache transfers and the higher hit rates in a shared L2 cache means that that the 10.6GB/s bandwidth of each 1333MHz FSB (of which each processor has its own) just may be sufficient. If it isn't, Intel could simply augment a large 16MB shared L2 cache with a 16MB shared L3 cache. The additional cache would further decrease the need to access the RAM and ensure the FSB bandwidth goes further. With a 65nm process, the die size of such a behemoth is probably the same as the current 90nm Xeon MPs that integrate 8MB of L3 cache, so it isn't too unreasonable. In addition, with the pipeline reduction of Intel's next-generation architecture, and the inherent improvements in Intel's 65nm process, adding more transistors for the L3 cache wouldn't jump the power consumption too high. Certainly, it would still be cooler than Intel's current Xeons. The use of sleep transistors in the 65nm process also means the L3 cache could be shut down when not needed further alleviating concerns of crazy heat or power consumption.
In regards to the implementation of HyperTransport in Intel processors, seeing that there is a set time to get a new processor to market, whether Intel decides now to design an entirely new processor to accomodate HT or they stick through their own technology's teething problems, the end result would be a similar time-to-market. It would therefore be best to stick with Intel's own technology, especially if they feel that its potentially superior.
On a side note, I would be interested to see what becomes of Intel's attempt to integrate a northbridge and a voltage regulator into a processor. It is works, this will probably do AMD one better since latency and bandwidth issues would be almost nonexistant. Of course this would add to the price, but it may work for high-end products since they are normally coupled with Intel's best chipset anyways (ie. 955EE Presler and the 975X chipset). I believe Intel was looking for an introduction sometime at the end of the decade. With the use of the 45nm process or something smaller, everything would fit nicely and run cool.
While integrated memory controllers are good for high-end computers, I'm curious as to their effects on low-end systems. Since the RAM needs to be accessed through the processor, I wonder what the effects are to integrated, TurboCache, and Hypermemory graphics cards? The latency will obviously be higher. The problem would now have switched from the processor fighting the addons for memory bandwidth to the addons fighting the processor. I'm sure this will also have an effect on sound cards especially with multichannel and digital sound becoming popular as these requirement quite a bit of memory. This is probably why Creative has taken it upon themselves to alleviate the problem by integrating large amounts of RAM into their high-end sound cards. RAM-graphics card latency issues associated with integrated memory controllers will probably be more pronounced once Windows Vista is released. Since Microsoft recommends at least 512MB of video cache to run with all the visual effects activated, even the most high-end graphics cards today are lacking. Even a 256MB 7800GTX may need to access the RAM through the PCIe bus, through the chipset, through HT, through the processor, through the memory controller, then through the memory bus then to the RAM. How much effect this has compared to going directly through a northbridge based memory controller remains to be seen. Most likely the latency issue is small, but this is the core OS. Even small latencies in the OS will filter through and magnify as applications are run.