Second-class Intel to trail AMD for years

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
In the it appears, the the loss of an integrated memory controller won't be such a big deal. The technical improvements I've mentioned above help reduce bandwidth problems. Now, Intel is going to implement independant FSB to each processor, ie 4 FSBs in a 4-way server. The Inquirer now feels "a lot more hopeful on the raw numbers side" in terms of bandwidth availability. They do note that costs will increase although "this is far from a killer".

http://www.theinquirer.net/?article=27334

Concerns about Intel's processor being bandwidth starved for the next few years is now a moot point, because they simply won't be.
 
Concerns about Intel's processor being bandwidth starved for the next few years is now a moot point, because they simply won't be
Yes they'll be. Intel will still have the bandwidth problem no matter how much they increase their FSB (don't even mention 4 and more processor in a multiprocessor system).

Just to burst your bubbles, AMD will include more memory controllers for each core with socket 1207. Pci-e will be embedded on the processor also (this means that graphic cards will no longer be bottlenecked from the processor).

For the time Intel releases their flagship processor (2007-2008), sadly (for Intel), they'll be competing with K10 which will be a different beast than current dual core Opteron processors. 😉

ATHLON 64 FX 55 (will be changed for an X2 ?? )
2X1024 CORSAIR XMX XPERT MODULES
MSI K8N DIAMOND (SLI)
2 MSI 6800 ULTRA (SLI MODE)
OCZ POWERSTREAM 600W PSU
 
And <A HREF="http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2565" target="_new">here</A> you can read some info that can backup all what I've said.

Untill now, all your statements are just fairy tales for youngsters. Without any links or any kind of proof, all your statements are just BS. You can fool the others, but you can't fool me. :)

ATHLON 64 FX 55 (will be changed for an X2 ?? )
2X1024 CORSAIR XMX XPERT MODULES
MSI K8N DIAMOND (SLI)
2 MSI 6800 ULTRA (SLI MODE)
OCZ POWERSTREAM 600W PSU
 
...And talking about coprocessors (if you've read the article from Anand), <A HREF="http://www.sci-tech-today.com/story.xhtml?story_id=003000002GY3" target="_new">this</A> might be AMD's new approach for their upcoming processors.

I'll be really sorry for itanium if this ever gets to happen in a not so distant future. This will finish put the nail on Intel's coffin (just my personal thoughts).

ATHLON 64 FX 55 (will be changed for an X2 ?? )
2X1024 CORSAIR XMX XPERT MODULES
MSI K8N DIAMOND (SLI)
2 MSI 6800 ULTRA (SLI MODE)
OCZ POWERSTREAM 600W PSU<P ID="edit"><FONT SIZE=-1><EM>Edited by Bullshitter on 10/31/05 11:01 PM.</EM></FONT></P>
 
I think you don't get it dude:

The FSB is out dated!!!

I believe it has about 20+ years.

Intel knows it and they are working really hard to come up with something new.

The easiest way for them is to adopt HTT and include memory controllers on the processors. Anyhow, thay're too stubborn since they don't like the idea of using technology from others. That's why they're suffering now the consequences.

Even Dell is suffering from Intel's problem (we all know that Dell is intel's whore). Read <A HREF="http://www.theinquirer.net/?article=27348" target="_new">here</A>

ATHLON 64 FX 55 (will be changed for an X2 ?? )
2X1024 CORSAIR XMX XPERT MODULES
MSI K8N DIAMOND (SLI)
2 MSI 6800 ULTRA (SLI MODE)
OCZ POWERSTREAM 600W PSU
 
I don't doubt that integrated memory controllers are the future. I'm just continuing to respond to the original topic of this thread, which was the article from The Register indicating that Intel will be bandwidth starved without an integrate memory controller. I'm just pointing out that this isn't the case as Intel will implement individual FSBs to each processor. Maybe not as good as integrated memory controllers but it will provide enough bandwidth as viewed by The Inquirer.

http://www.theinquirer.net/?article=27334
 
In the case of 4 or more processors in a multiprocessor system, each processor will have its own dedicated independant FSB to supply enough bandwidth
Yes well that is thier way of dealing with opteron's chip to chip HTT link. Sad isn't it?
You are aware that the present xeon latency is double what the opteron's is?
You do know that opteron has lower latency to ram, than the xeons have to L3 cache right?
Before you go spewing Intel PR crap around here, or make up imaginary flaws in Amd chips, take the time to understand what you are talking about.
 
If I'm not wrong, AMD followed this approach with the Athlon MP even before Intel ever thought about it.

Dual independent buses has their advantage and issues, that's the reason AMD didn't continue with this approach and instead they went with the idea of integrating the memory controller on the processos itself.

With dual independent buses, Each core (or processor) has it's own bus, but the latency STILL remains since the processor has to communicate with the northbridge before it gets to main memory. This is great for a dual setup, but things starts to get nasty when you go to 4, 8, or 16 processors. In contrast, AMD's Opteron can be integrated in a system with up to 32 sockets (thanks to <A HREF="http://www.theinquirer.net/?article=26948" target="_new">Horus</A>), this will be IMPOSSIBLE for Intel and their current FSB limitations (please read the article).

As I said before, the FSB is last-century technology.

WOULD YOU LIKE TO BE A WELL FED SLAVE OR A HUNGRY FREE MAN?
 
In regards to the L3 cache latencies, I haven't heard that before. Although in truth I'm not that surprised. Intel has a tendency of slapping cache on processors without tweaking their caching algorithms to take full advantage of it. Blunt addition of cache would also result in increased latency. I'd be interested in seeing those benchmarks.

In the case of understanding what I'm talking about, what are you referring too? It is my analysis of the benefits of a shared L2 cache?

http://yara.ecn.purdue.edu/~pplinux/ppsmp.html

This might be a bit dated and based on Linux, but the process of how shared cache works is still the same. The benefits are clear even back then:

"The good news is that many parallel programs might actually benefit from the shared cache because if both processors will want to access the same line from shared memory, only one had to fetch it into cache and contention for the bus is averted. The lack of processor affinity also causes less damage with a shared L2 cache. Thus, for parallel programs, it isn't really clear that sharing L2 cache is as harmful as one might expect.
Preliminary experience with our dual Pentium shared 256K cache system shows quite a wide range of performance depending on the level of kernel activity required. At worst, we see only about 1.2x speedup. However, we also have seen up to 2.1x speedup, which suggests that compute-intensive SPMD-style code really does profit from the "shared fetch" effect."

The concept behind the benefits of shared L2 cache is also explained by X-Bit Labs.

http://www.xbitlabs.com/articles/editorial/print/idf-f2005-2.html

"Mobile dual-core processor manufactured with 65nm technology aka Yonah, which I have already talked about during the Napa platform discussion, will feature Intel Smart cache. Smart cache means shared cache between the two cores. Since we have two cores and we have a single bus, there will also be shared single bus interface. This way both cores can share the same copy of data from the L2 cache. Besides that, the L2 and data cache unit feature improved pre-fetches, i.e. we can do pre-fetches on the per-thread basis, thus ensuring better bus utilization. It also has bandwidth adaptation buffer, i.e. each core takes 4 cycles to adapt.

The difference between independent and shared caches is the following. In case of independent caches the data is transferred from one core to another via the FSB. In case of shared caches the data is transferred directly between the caches, you avoid the bus traffic and synchronization time to get on the bus. This is important for multi-processing systems, because this way you reduce the number of bus cycles involved."

While the article is on the Intel IDF, the blurb above is not Intel marketing, it is the opinion of the X-Bit Labs author of this editorial piece. As well, the improved data-fetching techniques I mentioned which would reduce FSB bandwidth requirements are also mentioned by X-bit Labs. Clearly as I mentioned before, shared L2 caches can greatly benefit multicore processors which is why Intel is implementing it.

The stuff I mentioned about inclusive and exclusive caching and their differences is just very basic background information.

http://www.cpuid.org/reviews/K8/index.php

"The exclusive relationship is the most flexible, as it allows lot of different configurations in keeping a good performance index. The drawback is that the performance does not increase very much with the L2 size. The inclusive relationship can only be chosen for performance purpose, knowing for example that increasing the L2 will create a performance boost."

That is why I said AMD doesn't have large L2 caches, because it wouldn't benefit them much anyways. On the other hand, Intel processors can have large performance gains from more L2 cache.

I hope these external sources satisfy you that there is basis to what I am saying.
 
The fact I'm pointing out is that while an integrated memory controller decreases RAM access latency for the processor, it increases RAM access latency for every other component
So now you have a NB that sets memory addressing? Great, too bad your sound card just screwed your game, by replacing AI functions, with sound track, cause the NB said it could.
And now we are developing 64 bit NBs so we can run 32 bit chips, in long mode.
All memory calls have to go through the chip, period. The ODMC improves latency on all memory calls.
 
It is my analysis of the benefits of a shared L2 cache?
What analysis? Analysis requires understanding, which you just lack.
Example. The shared cache requires larger cache, which increases latencies. The shared cash requires additional "optimization" to control cache access. While the shared cache has advantages when bothe cores require the same info on cache, this is seldom the case, in real life. The shared cache will need core time, and interconnect time to function. Duh, that's going to require more prefetch, to work, and eat more bandwidth.
Look, if you dont understand what you are reading, ask, someone will be happy to explain. Dont come around spewing Intel disinformation, and expect people to buy it.
 
Obviously a NB memory controller doesn't work at the beck and call of the graphics card, but the point is neither does an integrated memory controller. However, in the case of a NB memory controller the graphics card can communicate with the RAM directly through the NB. In the case an integrated memory controller, at least one more step is added with the graphics needing to communicate with the chipset then the memory controller through the HT link. The fact I'm pointing out is that while an integrated memory controller decreases RAM access latency for the processor, it increases RAM access latency for every other component.
<Obviously a NB memory controller doesn't
<work at the beck and call of the graphics
<card, but the point is neither does an
<integrated memory controller.

Would you please clarify the above sentence. I mean, whatta hell does it mean?
And I don't mean just the word "beck".

It seems we have another BS generator in here. Oh my.

The traffic from RAM to vidcard is mostly one way and lots of it must be preprocessed by the CPU anyway. So, using NB as memory controller between RAM and AGP does not gain the performance a diddely sh|t.

Why do you think new vidcards are holding 256MB DDR2 on board? Tell me that.
I can give you a hint, it's not a marketing gimmick ala Intel, it has a reason.
 
I don't doubt that integrated memory controllers are the future. I'm just continuing to respond to the original topic of this thread, which was the article from The Register indicating that Intel will be bandwidth starved without an integrate memory controller. I'm just pointing out that this isn't the case as Intel will implement individual FSBs to each processor.
Individual FSBs won't do anything except alleviate the bus loading issue. All that does is let each Xeon socket's FSB bandwidth remain on par with the desktop socket's FSB. It's better than nothing, but hardly sufficient.

As long as all memory is hanging off a single memory controller, all cores will still be contending for memory bandwidth. Slapping four or more DDR2 memory channels on the board would help, except it's generally not feasible to route traces for four high-frequency memory channels hanging off one controller chip.

"You have been sh<font color=black>it</font color=black> upon by a grue."
 
No need to go deep in technical details such as virtues of on-chip memory controllers, latencies, bandwidth limitations etc. As far as I am concerned, these are not the reasons but results of Intel’s fall.

I believe the main reason is its poor and unfortunate management.

What do you expect when a company which sells technology is managed by technology-illiterate marketing and finance people?

Misleading customers with first appealing but eventually meaningless high MHz specs and bullying manufacturers to use their poorer-quality chips can only help so much and lasts only short term.

Once there is a replacement in the market, it is only a matter of time that the company starts to lose sales and power.

In other sectors, companies try to recover after major mismanagement by downsizing, cost cutting or re-structuring. However, in the CPU manufacturing business where the future of a company strongly depends on a very long-term and expensive R&D, it is close to impossible to reverse the course and swallow losses. That is to say if you play the wrong card, it is usually irrevocable.

And even Intel knows that they did exactly that.

Intel, in my humble opinion, is in the beginning of its career’s end in the tech business. It is just a matter of time. When everything is over, Intel will take its honorable place next to IBM, Apple, SGI, Commodore and other known names of the computing history and might or might not continue to exist as an insignificant manufacturer.
 
It looks like someone forgot to tell Intel they were on the way out.

<A HREF="http://news.yahoo.com/s/nm/20051102/tc_nm/intel_dc" target="_new">http://news.yahoo.com/s/nm/20051102/tc_nm/intel_dc</A>
<i>SAN FRANCISCO (Reuters) - Intel Corp. (Nasdaq:INTC - news) has restarted a factory after spending $2 billion to retool it with the latest technologies that will let it produce more powerful chips more efficiently and at a lower cost.

The plant, known as Fab 12, is Intel's second that has begun volume production combining wafers that are 300 millimeters in diameter, about the size of a dinner plate, with a 65 nanometer etching process.

Intel, the world's top chipmaker, is moving to 65 nanometer technology from 90 nanometer.

The smaller etching process means Intel can make its chips smaller and more powerful by squeezing more transistors on them, while larger wafer size means it can get more chips out of each wafer.

"It's back running production volume and over the next year that will ramp up," Bob Baker, Intel's vice president of manufacturing, told Reuters.

The factory, which was taken offline a year and a half ago, would make "almost all" of Intel's microprocessor line-up, Baker said.

The Chandler, Ariz.-based plant had about 1,000 employees, of which about 800 had been sent to work and train at other Intel plants in Oregon, New Mexico and Ireland during the upgrade, Baker said.</i>
 
Funny enough, I thought of including Microsoft in the list and changed my mind afterwards.

Microsoft is built on another Intel-like arrogance and ignorance of IBM then. For as long as Willy the Gates is around, I don’t think that Microsoft will fail as he very well knows what he is doing. I don’t believe one second that he will make the same mistake which made him and Microsoft what they are today.

On the other hand, when he decides to enjoy his millions and retire, I am pretty sure all those power point presentation-hungry bshtng bureaucrats with no business sense who are lurking in the company today will step up and turn Microsoft to another loser.

But we have time to watch (and enjoy - most of us) it happening.
 
I'm not sure why you're all picking on ltcommander_data. He makes some fair points. I think you folks are just trying to pick a fight with him or something, because so far I haven't seen anything that has deserved the treatment he's gotten.

1) For example, he wonders, and quite fairly I might add, if the on die memory controller adds latency to things like onboard graphics. The reasoning is rather obvious. The ODMC adds extra steps into the process.

NB: PCI DMA -> NB -> MEM -> NB -> PCI DMA
OD: PCI DMA -> NB -> CPU -> MEM -> CPU -> NB -> PCI DMA

The question is not if the ODMC adds more steps, because it clearly does. The question is do these extra steps actually cost anything in terms of latency? Given the PCI bus speed, I doubt it will there for DMA access. For faster busses however (such as graphics) it <i>may</i>. Then again, it may not. Some tests would be nice to prove this one way or the other.

He hasn't said that AMD suxors. He hasn't said that Intel rules. He's merely observed the extra steps and wonders what impact that may actually have. As we've seen with how early AMD HTT tests affected AGP, sometimes these little extra steps that seem harmless aren't so insignificant after all. Even the two versions of PCI busses running on the same system have caused some weird latecy problems. Not that these kinks can't be worked in the end, but that they're design considerations that should be looked at closely.

Though, in truth, this is all rather a moot point IMHO since people with onboard graphics aren't exactly concerned with performance anyway, but still an interesting technical query.

2) As for his talk about memory bandwidth and latency issues, while I don't argue that ODMC has lower latency and thusly is better, I think it's hard to argue that Intel will actually suffer if what he says is true about Intel moving to a quad-channelled architecture. This would not only increase the bandwidth needed for extra cores dramatically, but also decrease the memory latency in the same manner that dual-channel memory did. Sure it'll cost. Sure, it'll be a pain to implement on a motherboard. (Which will in turn cost yet more.) But Intel is hardly known for being cheap. **LOL**

Anywho, aside from the possibly unfair ridicule of ltcommander_data...

Personally, I think Xeon's biggest problem in competition will be that it'll still probably use a slower FSB than an equal P4, and thus be crushed by Opteron there ... same as always really. AMD concentrates on making their server CPUs their best and their desktop CPUs their second best. Intel concentrates on making their server CPUs their second best and their desktop CPUs their best. Intel can't compete soundly against AMD in the server market until they change that one simple strategy to match AMD's.

I also think AMD's lower latency from ODMC benefits their core well because their prefetch isn't stunning. Intel's prefetch however, being fairly good, means that Intel will gain less from an ODMC. That doesn't mean that Intel won't benefit, but so long as Intel sticks to the Netburst architecture, the gain may just not be worth the resources to implement. Either way, Intel certainly won't gain as much as AMD did when (if) they add ODMC to Netburst, so it's not really fair to say that this is holding Intel back all that much.

I think it's a shame that Intel's replacement of Netburst isn't going better. Perhaps they should have just stuck with the P3 all along, but even then, Netburst was an interesting attempt. It likely will fail in the long run, but even then I have my doubts that the failure is in Netburst itself. If you look at all of the hacks Intel put into Scotty to allow it to scale higher (which as we see from Scotty's thermal problems was a stupid decision to make) then you see the death of the P4. However, had Intel actually worked on improving Northwood and just used SoI like AMD did, we'd probably see Netburst thriving well for years to come.

Netburst was about redesigning the CPU to be more virtual. It had a lot of possibility. But the architecture required such a shat load of logic and transistors to overcome the performance losses from it's complexities that it was barely better than a simple design because of the power usage and thermal output. Sure, one could just stick to that simple design, like AMD did. (Hell, like Intel even did for their mobile segment.) But there were theoretical design benefits that even Northwood never got to take advantage of because Intel just never took the architecture far enough. If you look at the original specs for Netburst before Willy's cut-down version, you'll see that Intel could have taken it a lot further. Though I think Transmeta was actually heading in a better direction. Had they Intel's resources and ambition, I doubt that neither AMD nor Intel would even exist in the CPU market today.

But anywhy, I think that while Intel has seen better days, I'm really not seeing anywhere here where things will change in any significant way. It'll still remain the same situation as always as far as I can tell. AMD won't trounce Intel, and Intel won't trounce AMD. It's the same old stalemate. Technology improves, but still, there's really nothing new. It's kind of boring actually. :\

Of course, Intel's biggest downfall is their idiot managers making all manner of stupid decisions. And AMD's biggest downfall is their fear to succeed. **ROFL** Neither company will get anywhere with what they have unless they can overcome themselves first. The race to crush the other won't be won by who's chips are better, but about which company can overcome their own internal problems first.<pre><font color=orange> ∩_∩
Ω Ω
(=¥=)</font color=orange> - Cedrik says that anyone who groups M$ with Commodore will suffer<font color=orange>
_Ū˘Ū_</font color=orange> a fate similar to Bunascii.</pre><p>
:evil: یί∫υєг ρђœŋίχ :evil:
The <b><font color=red>Devil</font color=red></b> is in my <b><font color=red>'98 Mercury Sable</font color=red></b>!
<b>@ 201K miles!</b>
 
You must be talking to me. Truth is, I hate it when someone comes spewing a corporate line, when they dont really have a grasp of what's being said.
If everybody and thier uncle (agp, pci, pci-express) starts addressing memory, the chip is going to feel dunsel.
Do you really think 4 fsbs, for 4 cores, is able to compete with the chip to chip HTT? 1 fsb for 1 chip sure seems to cause memory problems now.
Since Intel went to all the trouble of developing fully depleated silicon on insulator, I to was wish they would make use of it. The prescott may need a 400/1600 fsb (or HTT) to be in it's glory, but I think that may be possible with FDSOI. Then people's opinion of scotty might change. What is the point of DDR2, if it cant make that happen. (well aside from lower power)
 
This <A HREF="http://arstechnica.com/news.ars/post/20051011-5416.html" target="_new">article</A> can backup what I've said about Xeons needing more cache to solve bandwidth problems.

What all of this bandwidth talk means for our Opteron vs. dual Xeon comparison is that the dual Xeon needs a larger L2 to make up for the fact that it has much less bandwidth and (even more importantly) higher memory read latencies than the Opteron.
Hannibal knows his thing. 😉

WOULD YOU LIKE TO BE A WELL FED SLAVE OR A HUNGRY FREE MAN?
 
You must be talking to me.
Am I? I dunno. I was just talking in general. I didn't really tally up who said what, just when with the general impression I got. But if you feel so then it must be at least partially true.

Truth is, I hate it when someone comes spewing a corporate line, when they dont really have a grasp of what's being said.
Truth is, that's just your opinion that it's corporate trash. Just because someone else has a different opinion doesn't mean you should treat them like Bunascii. Maybe he actually beleives in it. I know that I can see <i>some</i> merit in it. It's not what I would do were I running Intel, but then that's true for just about all of Intel's CPU decisions lately.

Do you really think 4 fsbs, for 4 cores, is able to compete with the chip to chip HTT? 1 fsb for 1 chip sure seems to cause memory problems now.
First of all, most of the problems Intel CPUs are having these days are not related to the memory at all. They're related to Intel doing some Very Bad Things to Scotty (and all CPUs thereafter) in the hopes that they'd scale to much higher GHz. (Which they aren't scaling to because of power/heat problems.)

Second of all, Yeah, I do think one FSB per core would definately compete with HTT. What are four ODMCs if not four FSBs? It's tit for tat, just with Intel's standpoint instead of AMD's. The only major missing difference is the chip-to-chip HTT, which Intel has their own take on anyway.

Really, it should perform almost as well, just as a single core with a single memory system from Intel performs almost as well as AMD's.

And Intel is far more familiar with that than they are with their own HTT variant. Aside from being slightly less efficient, it's only real flaw is that it'll cost a butt load for Intel's customers. But then, this <i>is</i> Intel. So frankly, it makes perfect sense. From a technical standpoint it's kind of silly. From Intel's business standpoint it makes sense.

Since Intel went to all the trouble of developing fully depleated silicon on insulator, I to was wish they would make use of it. The prescott may need a 400/1600 fsb (or HTT) to be in it's glory, but I think that may be possible with FDSOI. Then people's opinion of scotty might change.
I agree. Intel should use SoI by now. I don't get why they don't, other than sheer perversity. Scotty still won't be in any glory though, even with SoI. Oh, sure, it'll use less power. That'll only make it <i>almost</i> as good as a NWC then. Is that really something to be proud of, to finally <i>almost</i> match what is now an ancient product? If anything it's only real advantage would be in allowing Intel to clock significantly higher to finally break their performance slump. But even then... I wish Intel had just never moved Netburst in the direction of Scotty. That was such a bad move. It's going to be hard to overcome that short of taking Netburst back to the drawing board. I really wish Intel would just do a good job of desktopizing their PM and get rid of Netburst all together anyway, but short of that, they could at least stop making Netburst worse than it already was at first launch.


:evil: یί∫υєг ρђœŋίχ :evil:
The <b><font color=red>Devil</font color=red></b> is in my <b><font color=red>'98 Mercury Sable</font color=red></b>!
<b>@ 201K miles!</b>
 
Second of all, Yeah, I do think one FSB per core would definately compete with HTT. What are four ODMCs if not four FSBs? It's tit for tat, just with Intel's standpoint instead of AMD's. The only major missing difference is the chip-to-chip HTT, which Intel has their own take on anyway.

Really, it should perform almost as well, just as a single core with a single memory system from Intel performs almost as well as AMD's.
Um, no.

The major missing difference is that the Intel solution still leaves all cores competing for memory bandwidth. Four FSBs isn't enough to compete with HTT unless each FSB gets its own memory bank--or, alternatively, if there's enough memory bandwidth hanging off the Northbridge to fully saturate all four FSBs at once.

Never mind the extra latency Intel's solution introduces for chip-to-chip traffic. Data that could efficiently travel by broadcast over a shared FSB (such as the all-important cache-coherency data) will now have to be routed through the NB. AMD's ccNUMA sidesteps that problem by using a classic mesh topology.

"You have been sh<font color=black>it</font color=black> upon by a grue."