why don't intel put a on-die memory controller?

aaron2504

Distinguished
Jun 16, 2004
62
0
18,630
If intel did intergrate a controler it would be fast as FOOK! looking at.

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2149&p=6.

there is a 50% frame increase and 10% clock decrease on the sempron.

As everyone knows the P4 loves cache and lots of memory. So the increase would be alot bigger. plus pared up with 667mhz DDR2 and 64-bit bandwidth. thats one fast cpu! Then intel should easly fly infront of amd!
 

Kanavit

Distinguished
Jan 6, 2004
390
0
18,780
that's the problem. who is gonna buy i925x, if intel started making onboard memory controller which can be 50% faster. It's all about money.

------
Prescott 3.2E 1MB L2 HT
1GB PC 3200 Dual channel(PAT)
Asus P4P800 Bios 1016
PNY Geforce 6800 GT 256MB DDR3
60,823 Aquamarks
 

Crashman

Polypheme
Former Staff
Yes, they'd have to sell chipsets for less and CPU's for more! OUCH, that hurts because it looks like an already overpriced part (the CPU) is even MORE overpriced. It also means their cheapest processors would have to get the same price increase as their most expensive ones. Now, that wouldn't have a significant impact on total system price (the cost difference comes out of the chipset), but it would make Intel's CPU's look even less price competitave with AMD.

If Intel wanted to REALLY get the performance out of their CPU, they'd put EVERYTHING from the chipset onto the CPU EXCEPT for the PCI-Express hub. Then they'd have a PCI-E x32 connection go from the CPU to the PCI-E hub on the motherboard.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
 

aaron2504

Distinguished
Jun 16, 2004
62
0
18,630
yeh, but they wouldn't have enough .90nm fab units to do that. and the older fab units which make chipsets at .18 would become idle.
 

juin

Distinguished
May 19, 2001
3,323
0
20,780
Your idea would use more that 2000 pin the cost of the package would explode.

AGP 8X connection
PCI EX 16X
PCI EX 16X
256 bit memory path
PCI EX 32X

i need to change useur name.
 

slvr_phoenix

Splendid
Dec 31, 2007
6,223
1
25,780
As everyone knows the P4 loves cache and lots of memory. So the increase would be alot bigger.
Actually, if Intel were to implement an ondie memory controller the increase would be a lot smaller than AMD's increase was. This is because with Intel's long pipeline they were forced to write excellent prefetch algorithms because frequent misses requiring a rerun through that pipeline after obtaining the correct memory would have cost a large loss of performance.

AMD on the other hand kept their pipeline small, and so they had the luxury of being able to live with mediocre prefetching. If they have to recycle through their pipeline after collecting new memory it's not nearly the performance loss that it is for Intel.

Thus when AMD implemented the ondie controller to reduce latency to a theoretical minimum this benefitted them greatly beause with the lower latency their frequent prefetch misses cost them that much less of a performance loss. But with Intel, who has good prefetch logic, there are many less prefetch misses and so would have much less to gain from a lower memory latency.

This is also why AMD systems gain much more from fast RAM timings than Intel systems do, because that too is just a trimming of the memory latencies.

This is not to say that Intel wouldn't gain at all from an ondie controller. They would just gain noticably less than AMD did.

<pre><b><font color=red>"Build a man a fire and he's warm for the rest of the evening.
Set a man on fire and he's warm for the rest of his life." - Steve Taylor</font color=red></b></pre><p>
 

Snorkius

Splendid
Sep 16, 2003
3,659
0
22,780
<Puting on asshat> If Intel did not do it, Intel is right and you're all W-R-O-N-G </Puting on asshat>

<font color=blue>The day <font color=green>Microsoft</font color=green> will make something that doesn't suck is the day they'll start making vacuum cleaners.</font color=blue>
 

Xeon

Distinguished
Feb 21, 2004
1,304
0
19,280
It will never happen, with reason of clock speeds. The memory controller can't scale as well as the machines IC's.

Also that would cripple the universal chipsets (currently the Xeon and Itanium upcomming chipsets) plan.

Xeon

<font color=red>Post created with being a dickhead in mind.</font color=red>
<font color=white>For all emotional and slanderous statements contact THG for all law suits.</font color=white>
 

Crashman

Polypheme
Former Staff
No, my idea would cut the pincount by HALF and only increase production cost of CPUs MODERATELY.

1.) No connection
2.) 1 single PCI-E x32 connection
3.) 128-bit memory path.

You're forgetting that NEW BOARDS would be required, which eliminates AGP, because intel's NEW CHIPSETS only use PCI-E. And most of those chipsets can only handle 32 PCI-E pathways as far as I know, which is split by the PCI-E hub into 1 x16 slot, several x1 slots, and several onboard device x1 traces.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
 

imgod2u

Distinguished
Jul 1, 2002
890
0
18,980
Actually, if Intel were to implement an ondie memory controller the increase would be a lot smaller than AMD's increase was. This is because with Intel's long pipeline they were forced to write excellent prefetch algorithms because frequent misses requiring a rerun through that pipeline after obtaining the correct memory would have cost a large loss of performance.

AMD on the other hand kept their pipeline small, and so they had the luxury of being able to live with mediocre prefetching. If they have to recycle through their pipeline after collecting new memory it's not nearly the performance loss that it is for Intel.

Your logic doesn't make sense. Both the on-die memory controller and aggressive pre-fetching are ways of reducing overall fetch latency. The on-die memory controller makes it so memory fetches take less time while prefetching allows more cache hits. Both have the same end result: less latency for load/stores.
Now, according to your statements, the K7/K8 is a lot less sensitive to memory bottlenecks. This would mean either a more aggressive prefetch or an on-die memory controller would have less benefits for it than for Netburst.
This is, of course, not really true. Theoretical throughput for the K7 at its relatively lower clockspeed is still quite similar (albeit not as high) as Netburst at its higher clockspeeds, so memory is, indeed, just as much of a bottleneck. The only difference is that the K7/K8 has a statistically higher IPC. It still wastes just as much execution resources waiting for memory.
I'd say Intel feels that with their current prefetching and caching methods (which are, quite frankly, much better than that used on the K7), the on-die memory controller wouldn't help as much because cache hits are already at such a high level. The problem is becomming (especially with Prescott) not that memory isn't fast enough to feed the processor, but that *cache* isn't fast enough (latency of the L1 data cache on Prescott is 4x that of Northwood!, the L2 cache latency is 2x). This lengthening of cache latencies is, most likely, another way to hike up clockspeed, but it really puts a dent on performance (made up for by certain fixes in the execution engine of Prescott).

Thus when AMD implemented the ondie controller to reduce latency to a theoretical minimum this benefitted them greatly beause with the lower latency their frequent prefetch misses cost them that much less of a performance loss. But with Intel, who has good prefetch logic, there are many less prefetch misses and so would have much less to gain from a lower memory latency.

Depends. If it weren't for the power wall Netburst is hitting, I would say that the on-board memory controller would help greatly for future (4+ GHz) processors as the barrier between processor and memory would grow even larger. DDR2 doesn't help much considering its latency timings are even greater.

This is also why AMD systems gain much more from fast RAM timings than Intel systems do, because that too is just a trimming of the memory latencies.

This is not to say that Intel wouldn't gain at all from an ondie controller. They would just gain noticably less than AMD did.

Again, depends on the scaling. At 3 GHz, I'd agree with you, at 4 GHz, it'd be an entirely different story.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
 

Crashman

Polypheme
Former Staff
I'm not really sure! I've read various material on the subject, and everything said 32 pathways at most! But I always thought they'd have, at most, 64!

Still, an x64 bus and 128-bit memory channel would only a moderate amount of pins. And if they really couldn't manage that, they could always have a proprietary bus between the PCI-E hub and the CPU.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
 

Mephistopheles

Distinguished
Feb 10, 2003
2,444
0
19,780
That's not an issue. Check out this arstechnica <A HREF="http://arstechnica.com/paedia/p/pci-express/pcie-1.html" target="_new">article</A>; it explains PCIe architecture in detail.

In particular, check out <A HREF="http://arstechnica.com/paedia/p/pci-express/pcie-6.html" target="_new">this page</A> on PCI lane negociation. It's a highly versatile architecture, apparently. You can seemingly put a x16 card in a x8 lane without serious issues... And it's not as if current GPUs are bandwidth-starved. So you could put a x16 card on a x16 connector (x16 lanes), another x16 card on a x16 connector (x8 lanes), and still get 8 PCIe lanes for other things. The second card would have half the bandwidth, but that's about it, it should work. Here is a quote:
This link width negotiation allows for some flexibility in designing systems and integrating devices with different lane widths, but it will make for some headache in the consumer space. People will have to figure out how to match link widths with device widths, and they'll be initially confused by situations in which the link is one width and the connector another, as is the case with an NVIDIA card plugged into an x16 slot attached to an x8 link.

The NVIDIA card plugged into the x8 link will talk to the switch and figure out that the link is only x8. It will then train down accordingly and transmit data at the appropriate x8 rate.

Also, it is ironic that Intel would push PCIe, a highly point-to-point, sophisticated interconnect, while keeping their usual CPU FSBs shared. AMD is actually doing the opposite: it's not doing PCIe as vigorously, but they're implementing more advanced CPU interconnects...

<i><font color=red>You never change the existing reality by fighting it. Instead, create a new model that makes the old one obsolete</font color=red> - Buckminster Fuller </i><P ID="edit"><FONT SIZE=-1><EM>Edited by Mephistopheles on 08/05/04 09:38 PM.</EM></FONT></P>
 

Mephistopheles

Distinguished
Feb 10, 2003
2,444
0
19,780
intel well should do it. but intel being intel they won't they will carry on increasing the FSB, cache and clock.
True, Intel is intel, but I think they might have learned something... Hopefully.

<b>What they should do</b>: Release a dual-core Dothan-based desktop chip clocked as high as possible. It could utilize a 1066Mhz FSB well, because even if it's shared resources, it's twice the usual 533Mhz Dothan gets. So if and when one CPU isn't using the FSB, the other has twice the normal Dothan FSB. If properly tweaked, Dothan could be coupled with the "arbiter" chip that Intel already has. Given a <i>properly accelerated</i> EM64T implementation in both cores, Intel could convince us to not frown at the thought of its high-end systems... And they'd reutilize their current infrastructure to bring a great product, much superior to prescott, still possibly in LGA775. <i>(Heck, even FOUR current dothan cores at a 2.0Ghz clock would dissipate less heat than a SINGLE prescott... Going four-core is almost overkill, though, especially if you consider that software won't make use of them... I'd love to have a 4-core processor, anyway...)</i>

<b>What Intel will do</b>, if they're truly stubborn: Keep netburst alive. Dual-core prescotts will be hot, and will be bandwidth-starved in comparison to single-core ones, because they'll probably use the same socket and 1066Mhz (<i>maybe</i> 1333Mhz, if we're lucky). Bandwidth-starving a netburst core isn't a good idea. Also, suddenly LGA775 might need revision for Glenwood... Dual prescotts will start at a lower clock and necessarily at a lower single-thread performance, I think.

<i><font color=red>You never change the existing reality by fighting it. Instead, create a new model that makes the old one obsolete</font color=red> - Buckminster Fuller </i>
 

juin

Distinguished
May 19, 2001
3,323
0
20,780
I'd say Intel feels that with their current prefetching and caching methods (which are, quite frankly, much better than that used on the K7), the on-die memory controller wouldn't help as much because cache hits are already at such a high level. The problem is becomming (especially with Prescott) not that memory isn't fast enough to feed the processor, but that *cache* isn't fast enough (latency of the L1 data cache on Prescott is 4x that of Northwood!, the L2 cache latency is 2x). This lengthening of cache latencies is, most likely, another way to hike up clockspeed, but it really puts a dent on performance (made up for by certain fixes in the execution engine of Prescott


Is twice the latency for L1D and L2 2 to 4 and 11 to 25.Gallatin L3 have less letency that L2 in prescott.Maybe intel have set this latency for 1 to 8 MB.

i need to change useur name.
 

trooper11

Distinguished
Feb 4, 2004
758
0
18,980
its also ironic that amd isnt enve pushng pci-e that much and yet the nforce 4 will be the first to feature the SLI dual video card solution from nvidia, although im sure it will be aivalble for itnel systems at the same time, but i bet it will be cheaper with and athlon system since you dont have to use ddr2.
 

P4Man

Distinguished
Feb 6, 2004
2,305
0
19,780
Sorry, but you made me chuckle with that post; you know better what intel should do than intel ?

Let's see why perhaps they do not do what you think they should do:

>Release a dual-core Dothan-based desktop chip clocked as
>high as possible

1) it may not support iAMD64 yet, and it could take a while before it does. People seem to think such a "simple extention" is done overnight, while in fact it requires a redesign of just about any crucial part of the core, and this simply takes time. Years in fact. Depending when intel started working on this, I would not expect a 64 bit Dothan derivative until the end of next year at the earliest (although i could be wrong obviously, and maybe intel started working on it much longer ago, but considering even Xeons seem to be struggeling with it, I wouldn't count on it).

2) it may not clock all that much higher. Its not because Dothan is in no way power limited that it has infinite clockscaling potential. Up until the Pentium 3 or so basically no (desktop) cpu was truly power limited, yet they all had firm limits to their clockspeed potential. Remember the core of Dothan is very much like the venerable Pentium Pro/2/3; just not designed to scale that high. People pointing to the overclocked Dothan notebook (@2.4 GHz) may also not realize a single overclocked cpu is hardly an indication of what is feasable in mass production with sufficient yields, headroom, and perfect stability. My WAG would be Dothan on 90nm is not able to significantly exceed ~2.4 GHz as a mass produced cpu, therefore, might have a very hard time keeping up with K8 as a desktop performance part since K8 will probably achieve 2.6 GHz even on 130nm.

Netburst OTOH, especially prescott, in spite of its thermal problems seems to have ample headroom if intel can sort out the thermal issues. Every 0.1v intel can lower Prescotts vCore immediately gives it a big boost. Do the math:

P=V²/R. so for a 3.4E you get:
115=1.425²/R or
R1= 0,017657609

Now lets lower Vcore with 0.2v to get closer to Dothans vcore, and keep the same TDP:
115=1.225²/R2 or
R2= 0,015179348

R2/R1=1,163265326

Since power consumption scales linearly with clockspeed, that means a 4 GHz prescott @ 1.225v would consume as much as the current 3.4E without any other process improvements. Increase the TDP a bit further (its way too high anyway ;), refine the process/design and 5 GHz over time may not be out of reach. ~1.2v seems like a reasonable target to me for a 90nm chip. Lowering Prescotts vcore through layout/process improvements seems like an easier approach than overclocking Dothans short pipeline beyond what it can handle .

Now think what would be better a 4+GHz prescott or 2+ GHz dual core Dothan ?

3) dual core just isn't the silver bullet many seem to think it is. Just run Doom3 on a dual chip machine, and see how little it brings. Imagine intels top end part would be a dual core, 2 Ghz Dothan today, imagine the negative press it would get. It would most likely get trashed by the A64 and older P4s. Doom3 wouldn't be the exception either... Dual core is looking very promising for workstation/server workloads, but it will take a while before it makes sense to sacrifice single threaded performance in favor of SMT performance on the desktop. Several years at least, if ever.

4) I don't have the numbers at hand, so I could be wrong on this one, but if I'm not mistaken Dothan's core (minus cache) isn't anywhere near half the size (in mm²) of Prescott. Therefore, a dual core Dothan would be (significantly) more expensive to produce than a netburst based cpu. Dothan is small, but 2 Dothans on a die is rather big for anything but a high end chip, and a high end (desktop) chip is expected to perform well on most, if not every app, something you can not reasonably expect from it (it will suck at games for one thing).

5) Dothan sucks on FP and is mediocre on SSE2 (at least compared to the P4). These things do not matter too much for its intended purposes (mobile), but it would really hurt as a high end solution for things like rendering, encoding, workstations apps like CAD, EDA, etc, which intel has been telling us for years is so important.

There might be other reasons, or some of the reasons I suggested may not be completely true, but I'm sure intel is doing what it does for a reason. If next year, intels desktop flagship cpu is not a dual core Dothan, its probably not because they didn't read your post, or hadn't thought of it yet...

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 

P4Man

Distinguished
Feb 6, 2004
2,305
0
19,780
>Intel has no doubt looked very closely at the idea.

They have in fact, tried it already; remember Timna ?

>What the main stumbling block would be is that they would
>have to sell thier chipsets for less.

No, that would rather be an advantage if anything. It would reduce chipset ASPs perhaps, but increase CPU ASPs, which means, they'd be increasing their overall revenue at the expense of VIA/SiS/ALI,etc, something I'm sure intel would not mind. Perceived prices, like Crashman noted, is not an issue either IMHO, the DIY market is too small to matter, and for complete machines, the difference just wouldn't be there. If anything, it ought to be cheaper overall (higher integration means cheaper, and the motherboards ought to be easier/cheaper to produce as well on top of cheaper validation,..).

So what could be the real reason intel does not implement a ODMC ? I'm guessing its any combination of the following:

1) Tinma hangover (Timna was supposed to be a cheap, highly integrated Celeron class cpu, but it featured a RDRAM memory interface at a time when RDRAM was 2-5x the price of regular DRAM.. ouch).

2) less flexibility. Integrated memory controller means you are less flexible in adopting other memory standards (think DDR2 for AMD). It also makes it harder to use a single core for different markets with different needs. For instance, mobile markets require low power (DDR2 is a good thing there), workstation/server needs tons of memory capacity even at the expensive of memory performance if needed (FB Dimm), and they need RAS features (chipkill, ecc, registered ram, etc) not needed on mobile or desktops. Desktop needs (price/)performance over anything. Having a single controller try achieve all that, is not always optimal.

3) FB Dimm. Many of the shortcommings of regular northbridges will be solved with FB Dimm (high bandwith per pin count, capacity/performance trade offs, ..). FB Dimm looks really promising for server/workstations IMHO, but it has no place in a laptop.

4) Clockscaling issues. Having an integrated MC may or may not make it harder to clock the cpu. Many people blame(d) AMD's ODMC for limiting K8's clockspeeds. Could be true or false, but considering Intel is currently at some 50% higher clockspeeds as AMD, it could be an issue.

5) Intel strives to share platforms between Xeon and Itanium. I'm not sure if having an ODMC would in fact help achieve this goal or not..

6) NIH syndrome (Not Invented Here). even though ODMC is not exactly an "invention" by AMD, it might further increase the perception Intel is just following AMD's lead. If the advantages where huge, that shouldn't stop them, but I'm not sure they are.

7) Speaking of which.. just how much of K8's performance is due to its ODMC ? Every time K8 performs well, people automatically assume its because of it. I'm not too sure. K8 just is a great design, much more than just a K7+64bit+ODMC and even when paired with a traditional northbridge that is as good as Intel's chipsets, I think it should perform very well.

I think the real beauty of AMD's approach lies not so much in its single cpu performance, but the flexibility and performance of SMP opterons where bandwith increases with the cpu count, and hypertransport makes it easy to design and scale exceptionally well. Intel's response to this, is what they always do: add more cache. Tons of cache like on Itanium goes a long way in achieving some of those goals, and cache is cheap on $1500 cpu's especially when you have more fab space than God himself (it also helps nicely on single threaded performance, and SPEC like workloads).

Anyway, of all those reasons, I think 2 and 3 are the most important one's. ODMC is not "free", and for intel, the trade off may not be as interesting as for AMD. At least not yet ?

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 

aaron2504

Distinguished
Jun 16, 2004
62
0
18,630
Actually, if Intel were to implement an ondie memory controller the increase would be a lot smaller than AMD's increase was. This is because with Intel's long pipeline they were forced to write excellent prefetch algorithms because frequent misses requiring a rerun through that pipeline after obtaining the correct memory would have cost a large loss of performance.

woundn't ECC memory correct the misses?
 

aaron2504

Distinguished
Jun 16, 2004
62
0
18,630
If the smithfield is netburst based. Intel will be well and truly screwed all intel fans were waiting for the "prescott" what aload of rubish that was. Now they get excited about dual cores but i have a feeling that will be aload of tosh aswell. i don't know weather anyone else saw an article awhile ago on the web. Saying that a former intel design team worker, had said that the prescot piplines were all because of marketing and climbing clock speeds? anyone got a link for it?