AMD AIMS FOR FOUR-CORE OPTERONS BY 2007

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
If the cache is negated on bothe chips, the itanic is huge.
1 itanic has more transistors than a couple of dual core opterons. That's huge.

Well, the Itanium's core has 20 million transistors, while the Opteron has 40 million. If you look at the pure logic core by subtracting the L1 cache then the Itanium now has 18 million transistors while the Opteron has 32 million. So you see, if the cache is negated the Itanium is not huge. In fact the 2 Opterons can fit in Itanium's core not the other way around. Now to worry though, inversing numbers is a rather common mistake when you're only concentrating on attacking something.

You nicely avoided the issue of legacy support, which is where a lot of itanic's advantage comes from.

Exactly. This goes back to the various paths for the future of technology that I was trying to deal with. Maximum performance can be achieved through a complete break with current technology. Now I'm not saying that Itanium achieves maximum performance, I'm just saying thats a possible efficient choice. I myself don't support Itanium, not because the technology isn't good, but because without good compatibility or some sort of transitional mechanism it isn't beneficial from a consumer perspective. Other options include continuing to push superscalar technology, and multi-threading which is the current path that has been choosen.

How is it that conroe can add execution units, while the K series cant?

I'm not saying that the K series can't add more. In fact it doesn't need to as it already has 3 full FPUs and 3 full ALUs. I'm just saying that Conroe have at least 2 full FPUs and 3 full ALUs which is an improvement over the Pentium 4, and a large improvement over the Pentium M which only had 2 ALUs, 1 FPU and 1 vector unit.

Even though the K series can add more execution units, its unlikely that they will. At 6 execution units, (9 including the memory stores, etc.) the K8 already has more than enough for most circumstances. Adding more would just use up die space and increase heat for little benefit.

You dont understand HT, or schedulers, so come back when you do.

"To the end user, it appears as if the processor is "running" more than one program at the same time, and indeed, there actually are multiple programs loaded into memory. But the CPU can execute only one of these programs at a time. The OS maintains the illusion of concurrency by rapidly switching between running programs at a fixed interval, called a time slice. The time slice has to be small enough that the user doesn't notice any degradation in the usability and performance of the running programs, and it has to be large enough that each program has a sufficient amount of CPU time in which to get useful work done."

http://arstechnica.com/articles/paedia/cpu/hyperthreading.ars

That is what I understand a scheduler to do, to decide which processor a thread is sent and to decide how much processing time a thread gets. A scheduler doesn't really order the threads as the processor executes out-of-order anyways.

"Hyper-Threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a Hyper-Threading equipped processor to pretend to be two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously."

http://en.wikipedia.org/wiki/Hyperthreading

That is an accurate summary of what HT is. The Arstechnica site above seems particulary good in its breakdown of SMP, SMT, and HT.

Based on those understandings, what I've said appears to be logical. Now if I or those websites am so incorrect, feel free to correct me. Where in my analysis of HT potential in Conre am I wrong? How can a OS based scheduler make up for hardware based HT support? I'm not unreasonable, you just need to be a little bit more descriptive than "come back when you do."

I dont care if you get your Intel line @ TGH or Anand, BS is BS

Well, I use not only TGH and Anandtech, but also X-Bit Labs, Arstechnica, Digital Life, The Inquirer, The Register, and Game PC among others. But, if you view any site reporting the facts or drawing even neutral conclusions on Intel as BS and blasphemy then there's not much I can argue with.
 
Well, the Itanium's core has 20 million transistors, while the Opteron has 40 million.
Not even close. Guess again.
Maximum performance can be achieved through a complete break with current technology
Yes, all current technology on an ongoing basis. Throw away everything, and buy more, and again and agian, if you want to keep that advantage. Great marketing! But it is what gives itanic it's kick. Sounds like jobs to me. (and I do mean Steve)
the K8 already has more than enough for most circumstances. Adding more would just use up die space and increase heat for little benefit.
So adding dedicated SSE2 units wouldn't help in encoding?
When the A64s were brought out, that was the most common recommendation, but I'm sure you will tell us otherwise.
"To the end user, it appears as if the processor is "running" more than one program at the same time, and indeed, there actually are multiple programs loaded into memory. But the CPU can execute only one of these programs at a time. The OS maintains the illusion of concurrency by rapidly switching
[red] and so on[/red]
So, you can use google. Not much of a start. What do you think you have to do to "understand" HT?
But, if you view any site reporting the facts or drawing even neutral conclusions on Intel as BS and blasphemy then there's not much I can argue with.
If you are saying you can not tell the difference between a piece written by an unbiased reviewer, and the Amd/Intel marketing teams, there really is no point in talking to you.
Personnally, I believe that you are bright enough, aside from your prejudice.
 
Hey commander, are you still optimistic about Intel's offerings for next year?

I guess the The Inquirer is not. :wink:

Just a liitle quote:

SOURCES WHO attended the Supercomputer show in Seattle last week were shown a number of boards from AMD behind the scene which indicate to us that 2006 may well be even tougher for Intel than 2005 on the server front.
 
Not even close. Guess again.

Well, the figures I got were from the chart in this page. Note that I am not comparing the total transistors per core which of course the Opteron would be smaller. I am comparing the transistors that actually process data not store it, or the "core" transistors. Further removing the L1 cache from the "core" would yield the "pure logic core". In both those cases the Itanium uses less transistors to actually process data.

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2598&p=4

I'm just going to trust Anandtech with the transistor count since I don't have time to look through AMD or Intel's technical documents. However, just looking at the Itanium die images you can already see that the majority of the transistors are due to the L3 cache, the L3 tags, the L2 cache, the L2 tags, and the bus logic. It isn't hard to see that the number of transistors on the Itanium actually processing data is small compared to the entire die.

Great marketing! But it is what gives itanic it's kick.

That's exactly what I'm saying. I don't support it for its lack of compatibility, but its still interesting to see what potential is out there in other architecture paths.

So adding dedicated SSE2 units wouldn't help in encoding?
AMD's K8 architecture is focused on general purpose execution cores which is why there aren't dedicated SSE2 units. Intel's P4 architecture on the other hand is more specific in that the execution cores are divided into simple (fast) and complex (slow). With Merom, Intel is moving in AMD's direction by incorporating general purpose execution cores. Certainly dedicated SSE2 units would help, but unless AMD wants to be scene to be moving toward Intel's old architecture, they would probably just increase the number of general FPUs which can do SSE2 calculations.

So, you can use google. Not much of a start. What do you think you have to do to "understand" HT?

I've asked for your exalted knowledge on the subject but of course your only response has been "come back when you do." Well, if my own understanding is flawed, the websites I look at are BS, and I'm not allowed to Google, it becomes increasingly difficult for me to understand HT. [/quote]
 
It's always been known that HT doesn't benefit some functions so this is nothing new. I personally do quite a bit of video encoding and HT is usually quite beneficial there.

In any case, this problem with HT is due to the Prescott-type architecture. What I'm thinking about is HT for a 45nm Conroe-derivative. Conroe itself will have 4MB of cache, which is double or quadruple what the processors in there test have. That will in itself help limit thrashing since more storage space is available. As well, since this is a shared architecture between 2 cores + 2 virtual cores, and there is more space available, it is more likely that what is needed by 1 core, is already in the cache from another core. This will also reduce thrashing. A major reason why thrashing causes such a significant reduction in performance is because the cache latency in Prescott is so high. This will not be a problem in Conroe, which uses the low latency cache design of Dothan and Yonah. Low latency means that even if thrashing occurs, which it doesn't in many cases, data can be sent back to the cache faster reducing the performance hit. The FSB will also be increased from the 800MHz in most Prescotts, to 1066MHz similarly reducing the performance hit. Conroe will also feature more advance prefetch routines to help.

As well, earlier endyen had concerns about the 840EE assigning lower performance to primary threads, while increasing performance of tertiary and lower threads. As it turns out, this isn't a flaw in HT itself. It is a flaw in the scheduler since it recognizes the 2 real cores and the 2 virtual cores the same. It then distributes to high demand tasks on the same physical core resulting in a performance decrease. However, if either your program manages the affinity of the threads or you yourself do it, you will be able to receive a performance boost by having HT enabled on a dual core system.

"To sum up, at the moment of the release of dual core CPUs with HT, the behaviour of muti-threaded applications will have to be carefully studied, regarding the problems of Windows XP scheduler to manage four logical CPUs with efficiency. If Microsoft does not update Windows XP scheduler in order to fix this (that is very unlikely, remember that Windows 2000 was never fixed to correctly handle HT), applications developers will (one more time) have to take that in charge in their application."

http://www.x86-secret.com/index.php?option=newsd&nid=849

Windows Vista will likely fix this problem in its scheduler to make it fully compatible with 2 cores + HT. In the mean time application developers will have to pick up the slack. While this may seem tedious, it is really in their best interests since 2 cores + HT does give a performance boost that's worthwhile since its free.
 
Yeah actually I am. I should first mention that I'm mostly referring to the single-processor workstation and 2-way markets. I have to agree that Intel's 4-way offerings pale decidedly against AMD's at least in the near future.

First of all, the article only vaguely mentions optimism on AMD's motherboards. These I presume are the new Socket F variety. These will probably be shipping with socket M2 sometime in May-June. In such a case, Intel will be able to remain competitive in Q1. As no compatibility problems with Dempsey has been mentioned, unlike the lower-end Preslers, it will likely ship in January with Yonah and Presler. Dempsey has been shown to be highly competitive with AMD's highest 2-way Opteron the 280. Granted AMD will probably release a speed bump, but Dempsey will still be competitive at least until Socket F arrives. That is most of the first half of 2006.

Now for Socket F. While it will definitely increase the Opteron's performance, I doubt it'll be drastic. Since Socket F is new, the initial processors will only be to test out the socket and introduce it to market. As such they will be produced in 90nm. This means they are essentially the same as AMD's current Opterons, meaning no multiple memory controllers or integrated PCIe controller. The major difference will be the bandwidth increase from DDR2 667 support. I have no doubt that this will push the Opteron decidedly better than Dempsey. Multiple integrated memory controllers and PCIe controllers won't likely come until K8L which is in 2007. PCIe controllers may not be until K10, since I haven't heard anything from nVidia about their new chipsets not having PCIe controllers in them.

However, it is important to note that while Socket F will be introduced in May and be faster than Dempsey, Intel will be releasing Woodcrest in H2 2006. This gives AMD only a few months of unchallenged time at the top. Of course, I'm not sure of Woodcrest performance but I'm pretty sure it will be competitive to the initial socket F Opterons. Past 2006 AMD will have the K8L, and Intel will have a 45nm shrink of Woodcrest. That far into the future its anyones guess who's better.

Generally, with Dempsey given a few months before Socket F, and Woodcrest coming on Socket F's heals, I really don't think AMD will be getting a free ride in 2006.

Interestingly, Merom originally missed its tape out by a month in July. However, its 64-bit motherboards were already stable at that time.

http://www.theinquirer.net/?article=24788

Now Merom is 1 month ahead of schedule and looks to be launched Q3 2006 like the July article predicted. It's going to have 4MB of L2 cache and 64-bit support.

http://theinquirer.net/?article=27812

Woodcrest probably isn't far behind since it was already mentioned in the Intel price lists from the article you posted.

This only makes me wonder if I should get a Yonah laptop, which itself is stable and ready to launch, or wait for Merom which is ahead of schedule and coming along nicely.
 
Interesting perception of Itanic's core size.
First off, since the L2 cache, in the Itanics is more of an ALU than cache, it is usually included as part of the core.
Now there may be something wrong with my eyes, or that "scaled" image, but it sure looks like the core takes up more than 1/30th.
I was originally referring to monecito, which is expected to have a core (including bisc cache, but not L3) of 252 m transistors, while a single opteron core is generally concidered to have ~60m transistors (this includes trace and L1 cache)
That's exactly what I'm saying. I don't support it for its lack of compatibility, but its still interesting to see what potential is out there in other architecture paths.
Youi are saying that having everyone change every part(including all software) of thier computer every two years is somehow a reasonable option? That is what it would take to keep a lck of legacy support a viable option. Now Bill may Like it, and Paul may say it's the future, but most people wont buy it.
but unless AMD wants to be scene to be moving toward Intel's old architecture, they would probably just increase the number of general FPUs which can do SSE2 calculations.
No. Adding a dedicated SSE2 unit would effectively enable A64s to do encoding tasks as well as the P4s. If catching Intel at the only thing they still do well is "catching Intel's old architecture", let it be so.
Well, if my own understanding is flawed, the websites I look at are BS, and I'm not allowed to Google, it becomes increasingly difficult for me to understand HT.
Try collecting data, and using a little scientific formula.
Statements like
As well, earlier endyen had concerns about the 840EE assigning lower performance to primary threads, while increasing performance of tertiary and lower threads. As it turns out, this isn't a flaw in HT itself. It is a flaw in the scheduler
suggest you have a long way to go.
If you dont put any effort into it, the information wont be allowed to sink in. Ltes face it, you are an Intel fanbois. You would not accept anything I say, so the only way for you to understand what, why and where, is to do your own work. Here's a thought though, if HT worked on high IPC chips, dont you think Amd would have adopted it? After all, they already use SSE3, and that's mostly useless for them.