With all the hooplah going on with respect to the current Intel Developer Forum I would like to offer my opinions.
First off I am very happy that Intel has changed their focus and done a complete 180 degree turn from marketing initiatives to technical initiatives.
1. My personal preference between companies
I personally own an AMD K7 system and an Intel Netburst system. For my most recent system I chose the Intel P4 550 3.4GHz mainly because I got a very good deal on it. I am not unhappy with the performance as I like to multitask and perform a large amount of compression/decompression applications. You really notice the difference between an AMD Venice-based processor and an Intel Prescott processor when you're pumping a 1.6GB DivX movie through the processor.
However I also do a lot of gaming and the performance is definitely worse than what it could be.
As a personal preference I will always prefer Intel over AMD so long as they continue to promote and advance new technology as they have been. I gained a lot of respect for AMD when they designed their K8 architecture. I lost most of my respect for AMD when they flatly said they have no plans for upgrading their technology because it was superior to Intel. They should have considered improving the design of their SSEn floating-point units to make better cost-effective workstations, and investing in 65nm process technology would further improve the thermal envelope of their processors, improve the ability to overclock, increase the cache sizes of their processors and finally improve the manufacturing yields to lower costs.
However, from a business standpoint I understand why AMD is taking its time. It's the same reason that Intel stuck with its Netburst technology for so long (which started out strong but by the end Intel was more concerned with propaganda than real performance to maintain their market share). The market for personal computers is slowing down every year and R&D costs as well as manufacturing costs need to be covered before they can afford to make advancements.
I am glad that the microprocessor market is as competitive as it is and while the market will sway back and forth from Intel to AMD every few years, its the competition that drives the technology.
One of the reasons I admire Intel is, as a developer, I was able to order online free 80x86 developer manuals, which comprise of 5 full text-book size manuals referencing the 80x86 machine code language that Intel pioneered and AMD adopted for compatibility. I was shipped these free of charge. I did not even have to pay shipping, and received them less than a week later.
2. Why it's good to have a choice
The other reason I admire Intel is, as a system designer, Intel provides an extensive amount of resources for designing systems (from low end home PCs, office PCs, workstations, and servers) including tested memory lists, tested controller cards, and in-depth datasheets for everything. From a design and engineering standpoint, Intel is like the open-source of hardware.
When I want to build, say, 25 office systems for a business, it is much easier warranty and support wise to pick an Intel processor, an Intel motherboard with an Intel chipset and southbridge, using Intel graphics and and Intel network card, and going through Intel directly for all support, than it is to buy all those semiconductor components from 4-6 different companies on an AMD system and hope they work.
However, that's just the medium business market segment. Although I personally will buy Intel hardware for my own use--I gladly design systems for other people with AMD processors. That is, friends, family, and customers. At this time and for the past while I have simply been able to build faster systems cheaper using AMD Sempron's (754-pin) for entry level systems and AMD Athlon 64's for mid-end systems. On the high-end it goes either way depending on usage.
3. Integrated Memory Controller vs. Memory Controller Hub
One of the biggest differences between Intel and AMD from a design standpoint is the memory controller. As we all know, AMD's is integrated into the chip while Intel's is based on the motherboard.
Many people believe the integrated solution is completely superior from a performance standpoint, and this is not true.
There are mainly two prime types of system utilization:
The first is sporadic CPU to memory usage, which is what happens in any open-ended, user-driven applications like Windows itself, Office and productivity software, and all games. The bandwidth capabilities of even DDR400 memory is rarely saturated in this type of use. What's important is to have a large amount of memory and a low latency processor.
There are other types of latency which improve this type of processing other than the memory latency (which is always accessed in bursts or, often, a series of bursts, and the latency differences are very minor). These latencies comprise of the L2 and L1 cache latency... in addition to fetch, decode, execute and retire latencies which take the form of the processor's pipeline (which in the case of a K8 processor is less than half the length of a Netburst architecture pipeline).
In this type of usage pattern, hard drive access is rare, and most of the data is directly fueled from I/O to video memory over the HyperTransport bus. So for this type of use, there is little to be desired.
The other type of system utilization is sustained I/O while processing data. This is what will happen when you do a lot of content creation such as audio and video editing (especially HD video), engineering and CAD, 3D animation, etc. Systems that utilize higher bandwidth I/O setups such as buffered SCSI RAIDs (especially with multiple controllers over PCIe or PCI-X bus interfaces) will have extreme amounts of bandwidth going from I/O to memory, from I/O to video memory, and from CPU to memory primarily. Due to the larger amounts of data being transferred into system memory, the bandwidth of the memory is now critical. On a K8 system, since only the processor can access the memory, and the microarchitecture is limited to DDR400 memory without overclocking, this soon becomes a bottleneck.
Intel chipsets utilize DMA (Direct Memory Access) technology which allows I/O devices to transfer directly to memory without having to bog down the processor's bus interface.
Even though these systems will still be less responsive than an AMD counterpart due to the Memory Controller Hub architecture, it is a clear advantage in high bandwidth setups.
I have not read anything that would lead me to believe Intel is planning on implementing an integrated memory controller in the future. And I am glad because in my opinion it is a bad idea. I guess my whole point to this in depth description is that the integrated memory controller is not the prime constituent of the AMD advantage, it is the other aspects of the processor core microarchitecture. And I believe if we were to compare two versions of a K8 processor, one with an integrated controller and one with a seperate chipset controller, the decrease in performance would be minor in some areas while there would be an increase in performance in other areas.
However, the fact that AMD has to come up with a whole new socket and a new lineup of processors in order to change memory support is a pretty big downfall to the design and it forces unnecessary upgrades. I feel sorry for all the people who bought good 754-socket AMD processors and were quickly abandoned on all fronts with no upgrade paths because AMD had to change its entire socket and processor design for dual channel memory support. All you need on an Intel system is a new board. For people who invest $500 or more in one of the top of the line chips this is a big deal.
Once DDR2-800 comes out--what next? Quad-channel DDR2? Rambus XDR or something new from them? Even DDR3? If AMD adopts new memory too soon they screw over any of their customers who recently bought a high-end chip hoping for a good upgrade path. If they take too long in adopting it they lose a bandwidth performance advantage.
4. DDR vs. DDR2
There are design advantages to DDR2 from an engineering and manufacturing standpoint. But one thing I don't understand is why people always refer to DDR2 as being higher latency than DDR. It's the SAME LATENCY! This is because CAS latency is expressed in clocks, not time. DDR2-800 has 800 million clock cycles a second, and DDR400 has only 400 million. So each DDR2-800 clock consumes half as much time as a DDR400 clock. So while the effective fetch time for a single value is the same, DDR2 is capable of a boost in latency and bandwidth when bursts of data are requested.
CAS Latency (aka CL) is the time, expressed in memory clocks, that it takes in order for a memory request to appear on the bus. When you timeline it, it basically works like this:
In DDR400 memory running at a CL of 2, requesting 4 QWORD (64-bit) values looks like this...
Clock #1 - CPU requests DATA1: a READ request is placed on the control bus, the address of DATA1 is placed on the address bus, and the data bus is ignored.
Clock #2 - CPU requests DATA2: READ is requested, address is given, data bus is ignored.
Clock #3 - CPU reqeusts DATA3: READ is requested, address is given, data bus contains DATA1.
Clock #4 - CPU requests DATA4: READ is requested, address is given, data bus contains DATA2.
Clock #5 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA3.
Clock #6 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA4.
Therefore, the total amount of memory clocks needed to retrieve those 4 QWORD values is 6 clocks, or n+CL where n is the number of requests given. In realtime, this computes to 0.000000015 seconds or 15ns.
The same timeline with DDR2 memory running at 800MHz @ CL4:
Clock #1 - CPU requests DATA1: a READ request is placed on the control bus, the address of DATA1 is placed on the address bus, and the data bus is ignored.
Clock #2 - CPU requests DATA2: READ is requested, address is given, data bus is ignored.
Clock #3 - CPU reqeusts DATA3: READ is requested, address is given, data bus is ignored.
Clock #4 - CPU requests DATA4: READ is requested, address is given, data bus is ignored.
Clock #5 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA1.
Clock #6 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA2.
Clock #7 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA3.
Clock #8 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA4.
Therefore to receive 4 QWORDs from memory it would take 8 memory clock cycles. Therefore the n+CL equation remains true. However, because the memory bus is running at 800MHz, the realtime it takes to receive the data is 10ns. This makes sense because 8 DDR2-800 clocks equals 4 DDR400 clocks in realtime, which explains why 6 clocks would be 50% longer in realtime.
Of course, this comparison does not factor additional latencies such as the MCH and caching latencies. However these are seperate from strict DDR to DDR2 comparisons. There is and was nothing wrong with DDR2 in terms of real latency. It is the Netburst architecture which limited Intel performance in games. Which is why I believe Conroe will show a significant improvement in games vs. current AMD architecture. However, AMD should have their AM2 processors out by the time Conroe is out, if not sooner, so it will be a more apples to apples comparison then.
First off I am very happy that Intel has changed their focus and done a complete 180 degree turn from marketing initiatives to technical initiatives.
1. My personal preference between companies
I personally own an AMD K7 system and an Intel Netburst system. For my most recent system I chose the Intel P4 550 3.4GHz mainly because I got a very good deal on it. I am not unhappy with the performance as I like to multitask and perform a large amount of compression/decompression applications. You really notice the difference between an AMD Venice-based processor and an Intel Prescott processor when you're pumping a 1.6GB DivX movie through the processor.
However I also do a lot of gaming and the performance is definitely worse than what it could be.
As a personal preference I will always prefer Intel over AMD so long as they continue to promote and advance new technology as they have been. I gained a lot of respect for AMD when they designed their K8 architecture. I lost most of my respect for AMD when they flatly said they have no plans for upgrading their technology because it was superior to Intel. They should have considered improving the design of their SSEn floating-point units to make better cost-effective workstations, and investing in 65nm process technology would further improve the thermal envelope of their processors, improve the ability to overclock, increase the cache sizes of their processors and finally improve the manufacturing yields to lower costs.
However, from a business standpoint I understand why AMD is taking its time. It's the same reason that Intel stuck with its Netburst technology for so long (which started out strong but by the end Intel was more concerned with propaganda than real performance to maintain their market share). The market for personal computers is slowing down every year and R&D costs as well as manufacturing costs need to be covered before they can afford to make advancements.
I am glad that the microprocessor market is as competitive as it is and while the market will sway back and forth from Intel to AMD every few years, its the competition that drives the technology.
One of the reasons I admire Intel is, as a developer, I was able to order online free 80x86 developer manuals, which comprise of 5 full text-book size manuals referencing the 80x86 machine code language that Intel pioneered and AMD adopted for compatibility. I was shipped these free of charge. I did not even have to pay shipping, and received them less than a week later.
2. Why it's good to have a choice
The other reason I admire Intel is, as a system designer, Intel provides an extensive amount of resources for designing systems (from low end home PCs, office PCs, workstations, and servers) including tested memory lists, tested controller cards, and in-depth datasheets for everything. From a design and engineering standpoint, Intel is like the open-source of hardware.
When I want to build, say, 25 office systems for a business, it is much easier warranty and support wise to pick an Intel processor, an Intel motherboard with an Intel chipset and southbridge, using Intel graphics and and Intel network card, and going through Intel directly for all support, than it is to buy all those semiconductor components from 4-6 different companies on an AMD system and hope they work.
However, that's just the medium business market segment. Although I personally will buy Intel hardware for my own use--I gladly design systems for other people with AMD processors. That is, friends, family, and customers. At this time and for the past while I have simply been able to build faster systems cheaper using AMD Sempron's (754-pin) for entry level systems and AMD Athlon 64's for mid-end systems. On the high-end it goes either way depending on usage.
3. Integrated Memory Controller vs. Memory Controller Hub
One of the biggest differences between Intel and AMD from a design standpoint is the memory controller. As we all know, AMD's is integrated into the chip while Intel's is based on the motherboard.
Many people believe the integrated solution is completely superior from a performance standpoint, and this is not true.
There are mainly two prime types of system utilization:
The first is sporadic CPU to memory usage, which is what happens in any open-ended, user-driven applications like Windows itself, Office and productivity software, and all games. The bandwidth capabilities of even DDR400 memory is rarely saturated in this type of use. What's important is to have a large amount of memory and a low latency processor.
There are other types of latency which improve this type of processing other than the memory latency (which is always accessed in bursts or, often, a series of bursts, and the latency differences are very minor). These latencies comprise of the L2 and L1 cache latency... in addition to fetch, decode, execute and retire latencies which take the form of the processor's pipeline (which in the case of a K8 processor is less than half the length of a Netburst architecture pipeline).
In this type of usage pattern, hard drive access is rare, and most of the data is directly fueled from I/O to video memory over the HyperTransport bus. So for this type of use, there is little to be desired.
The other type of system utilization is sustained I/O while processing data. This is what will happen when you do a lot of content creation such as audio and video editing (especially HD video), engineering and CAD, 3D animation, etc. Systems that utilize higher bandwidth I/O setups such as buffered SCSI RAIDs (especially with multiple controllers over PCIe or PCI-X bus interfaces) will have extreme amounts of bandwidth going from I/O to memory, from I/O to video memory, and from CPU to memory primarily. Due to the larger amounts of data being transferred into system memory, the bandwidth of the memory is now critical. On a K8 system, since only the processor can access the memory, and the microarchitecture is limited to DDR400 memory without overclocking, this soon becomes a bottleneck.
Intel chipsets utilize DMA (Direct Memory Access) technology which allows I/O devices to transfer directly to memory without having to bog down the processor's bus interface.
Even though these systems will still be less responsive than an AMD counterpart due to the Memory Controller Hub architecture, it is a clear advantage in high bandwidth setups.
I have not read anything that would lead me to believe Intel is planning on implementing an integrated memory controller in the future. And I am glad because in my opinion it is a bad idea. I guess my whole point to this in depth description is that the integrated memory controller is not the prime constituent of the AMD advantage, it is the other aspects of the processor core microarchitecture. And I believe if we were to compare two versions of a K8 processor, one with an integrated controller and one with a seperate chipset controller, the decrease in performance would be minor in some areas while there would be an increase in performance in other areas.
However, the fact that AMD has to come up with a whole new socket and a new lineup of processors in order to change memory support is a pretty big downfall to the design and it forces unnecessary upgrades. I feel sorry for all the people who bought good 754-socket AMD processors and were quickly abandoned on all fronts with no upgrade paths because AMD had to change its entire socket and processor design for dual channel memory support. All you need on an Intel system is a new board. For people who invest $500 or more in one of the top of the line chips this is a big deal.
Once DDR2-800 comes out--what next? Quad-channel DDR2? Rambus XDR or something new from them? Even DDR3? If AMD adopts new memory too soon they screw over any of their customers who recently bought a high-end chip hoping for a good upgrade path. If they take too long in adopting it they lose a bandwidth performance advantage.
4. DDR vs. DDR2
There are design advantages to DDR2 from an engineering and manufacturing standpoint. But one thing I don't understand is why people always refer to DDR2 as being higher latency than DDR. It's the SAME LATENCY! This is because CAS latency is expressed in clocks, not time. DDR2-800 has 800 million clock cycles a second, and DDR400 has only 400 million. So each DDR2-800 clock consumes half as much time as a DDR400 clock. So while the effective fetch time for a single value is the same, DDR2 is capable of a boost in latency and bandwidth when bursts of data are requested.
CAS Latency (aka CL) is the time, expressed in memory clocks, that it takes in order for a memory request to appear on the bus. When you timeline it, it basically works like this:
In DDR400 memory running at a CL of 2, requesting 4 QWORD (64-bit) values looks like this...
Clock #1 - CPU requests DATA1: a READ request is placed on the control bus, the address of DATA1 is placed on the address bus, and the data bus is ignored.
Clock #2 - CPU requests DATA2: READ is requested, address is given, data bus is ignored.
Clock #3 - CPU reqeusts DATA3: READ is requested, address is given, data bus contains DATA1.
Clock #4 - CPU requests DATA4: READ is requested, address is given, data bus contains DATA2.
Clock #5 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA3.
Clock #6 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA4.
Therefore, the total amount of memory clocks needed to retrieve those 4 QWORD values is 6 clocks, or n+CL where n is the number of requests given. In realtime, this computes to 0.000000015 seconds or 15ns.
The same timeline with DDR2 memory running at 800MHz @ CL4:
Clock #1 - CPU requests DATA1: a READ request is placed on the control bus, the address of DATA1 is placed on the address bus, and the data bus is ignored.
Clock #2 - CPU requests DATA2: READ is requested, address is given, data bus is ignored.
Clock #3 - CPU reqeusts DATA3: READ is requested, address is given, data bus is ignored.
Clock #4 - CPU requests DATA4: READ is requested, address is given, data bus is ignored.
Clock #5 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA1.
Clock #6 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA2.
Clock #7 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA3.
Clock #8 - CPU stops requesting: no command is put on the control bus, no address is put on the address bus, data bus contains DATA4.
Therefore to receive 4 QWORDs from memory it would take 8 memory clock cycles. Therefore the n+CL equation remains true. However, because the memory bus is running at 800MHz, the realtime it takes to receive the data is 10ns. This makes sense because 8 DDR2-800 clocks equals 4 DDR400 clocks in realtime, which explains why 6 clocks would be 50% longer in realtime.
Of course, this comparison does not factor additional latencies such as the MCH and caching latencies. However these are seperate from strict DDR to DDR2 comparisons. There is and was nothing wrong with DDR2 in terms of real latency. It is the Netburst architecture which limited Intel performance in games. Which is why I believe Conroe will show a significant improvement in games vs. current AMD architecture. However, AMD should have their AM2 processors out by the time Conroe is out, if not sooner, so it will be a more apples to apples comparison then.