Is Core 2 Duo the first true, nature dual core processor?

Viperabyss

Distinguished
Mar 7, 2006
573
0
18,980
I read it somewhere else, but I couldn't remember the links now.

However, the poster's argument was that Athlon X2, although the first monolithic dual core processor, acts more like Pentium D, due to its individual L2 cache. With non-connecting L2 cache, X2 cannot share information between cores via cache, and can only communicate each other via Crossbar.

On the other hand, Core 2 Duo has a sharing L2 cache, with 2Mb capacity. With such big L2 cache, not only it can minimize the bottleneck caused by the aging FSB, but the processor is more efficent because the L2 will auto adjust itself to accomondate the information being processed by the cores.

I'm wondering if this theory is correct. Thanks in advance in clearing this up. Also, please don't flame each other. If you want to rebut this theory, please don't just post something like "AMD rocks, Intel sucks."
 

YO_KID37

Distinguished
Jan 15, 2006
1,277
0
19,280
I'd say the X2 was, since it was the first processor available with two cores, regardless of their cache structure or die type.

Wasn't it technically the Opterons? Servers hold priority over everything else.
 

YO_KID37

Distinguished
Jan 15, 2006
1,277
0
19,280
Core 2 Duo is just refined updated technology, but AMD is the first.
In this case, AMD was the first to come out with a technology, most of the time they take something crappy Intel did and make it a 1000X better and employ it as their own, but this time you have to admit that it was AMD, because a Intel did was to take 2 Existing Pentium 4 Chips and tape them together.
 

Mackle

Distinguished
Dec 25, 2006
60
0
18,630
What about the Sun V890 server? Supported up to eight dual core processors and I'm sure it's been about for quite a few years now.
 

YO_KID37

Distinguished
Jan 15, 2006
1,277
0
19,280
Did I see that technology employed from Server to Mainstream?? When was the last time you saw SUN Processors in a local LAN party?
 

Mackle

Distinguished
Dec 25, 2006
60
0
18,630
Did I see that technology employed from Server to Mainstream?? When was the last time you saw SUN Processors in a local LAN party?

Apologies, I thought the original poster was after who got the technology onto the market place first and someone mentioned AMD Opteron processors, hence why I mentioned Sun as I thought Opterons were predominantly in the server market.
 

YO_KID37

Distinguished
Jan 15, 2006
1,277
0
19,280
True, SUN and Opteron were in the first few into the market with More than Core, but to my understanding he only wanted to hear AMD and Intel, we usually don't refer to much more than those two, VIA, SUN, IBM and few other forgotten names also make Processors but we just never hear of them, or talked about them because they are so far behind these 2, SUN may have said something, but not completely sure it was "True Nature", may have been "Synthetic Nature :p "
 

Viperabyss

Distinguished
Mar 7, 2006
573
0
18,980
True, SUN and Opteron were in the first few into the market with More than Core, but to my understanding he only wanted to hear AMD and Intel, we usually don't refer to much more than those two, VIA, SUN, IBM and few other forgotten names also make Processors but we just never hear of them, or talked about them because they are so far behind these 2, SUN may have said something, but not completely sure it was "True Nature", may have been "Synthetic Nature :p "
humm I'm not aware of Sun's processor.

It would be really nice if you guys can expand on that :p
 

rninneman

Distinguished
Apr 14, 2007
92
0
18,630
Generally, a native dual core CPU is regarded as one that has both cores on the same die. AMD will tell you that they feel discrete L2 cache is preferred with good reason. Intel has their reasons for shared L2 cache. AMD does not have any shared L2 designs on their roadmap that I'm aware of, but no one is disputing that they have native dual core chips now and that K10 will be a native quad.

With that in mind, the first native dual core was Opteron followed by X2. Intel's first native dual core was actually the Core Duo, not C2D.

Just like the L2 cache argument, there are pros and cons to native vs MCM dual/quad core designs. One is not necessarily better than the other; they are just different and each manufacturer has reasons for choosing one or the other.

Ryan
 

Mackle

Distinguished
Dec 25, 2006
60
0
18,630
humm I'm not aware of Sun's processor.

It would be really nice if you guys can expand on that :p

The 890 server was released in 2004, and could have upto eight dual core UltraSparc-IV+ processors (and a minimum of two). You could also have upto 8GB per every two processors as well, I think?

The original proc options were 1.2-1.8Ghz Sparc IV+.


I don't know if that was the first server that had the UltraSparc IV+ dual core processors though.

They have 2MB of L2 cache, and 32MB of L3.

The primary application for these systems would be heavy number crunching, and things like large databases.

It's not your everyday rig for surfing the web and playing games, but stick a graphics card in the thing and you can install FireFox :?
 

rninneman

Distinguished
Apr 14, 2007
92
0
18,630
IBM also produced some dual-core G5's, correct?

You are correct. They released the PowerPC 970MP (970 is G5) in 3Q05. Probably its most notable use was in the "Quad core" Power Mac. Apple marketed it as quad core system even though it was just a dual socket system with 2 dual core processors.

Ryan
 

xsamitt

Distinguished
Mar 28, 2007
268
0
18,780
Just like the L2 cache argument, there are pros and cons to native vs MCM dual/quad core designs. One is not necessarily better than the other; they are just different and each manufacturer has reasons for choosing one or the other.

Ryan[/quote]

I was wondering about this very same thing recently myself.

Jumping Jack........Whats your take on this?As for gaming how would this logic hold up?

Anyone?
 

rninneman

Distinguished
Apr 14, 2007
92
0
18,630
Just like the L2 cache argument, there are pros and cons to native vs MCM dual/quad core designs. One is not necessarily better than the other; they are just different and each manufacturer has reasons for choosing one or the other.

Ryan

I was wondering about this very same thing recently myself.

Jumping Jack........Whats your take on this?As for gaming how would this logic hold up?

Anyone?

From a pure performance point of view (since you asked about gaming), native is preferred because power management can be more sophisticated thus allowing lower thermal envelopes. Therefore, clock speeds can be higher. So hypothetically speaking (since neither Intel nor AMD make the same dual core model in both native and MCM configurations), if an E6600 were to be available in either configuration, the MCM version would almost certainly have a higher power consumption and dissipation. So they would have more headroom with the native version to ratchet up clock speed. An overclocker would also have more headroom.

I believe the L2 cache depends on implementation and this is where Jack comes in. I don't know enough to describe the pros and cons of discrete vs shared.

Ryan
 

Viperabyss

Distinguished
Mar 7, 2006
573
0
18,980
With Firefox, you'll be able to multi thread your Porn, what takes an average man 5-10 minutes it'll take you 2.5-5 Minutes :lol:
if that's the case, i would rather use a Pentium Pro MMX for the job....

the longer the merrier :p :p
 

Viperabyss

Distinguished
Mar 7, 2006
573
0
18,980
Generally, a native dual core CPU is regarded as one that has both cores on the same die. AMD will tell you that they feel discrete L2 cache is preferred with good reason. Intel has their reasons for shared L2 cache. AMD does not have any shared L2 designs on their roadmap that I'm aware of, but no one is disputing that they have native dual core chips now and that K10 will be a native quad.

With that in mind, the first native dual core was Opteron followed by X2. Intel's first native dual core was actually the Core Duo, not C2D.

Just like the L2 cache argument, there are pros and cons to native vs MCM dual/quad core designs. One is not necessarily better than the other; they are just different and each manufacturer has reasons for choosing one or the other.

Ryan
It is undisputed that monolithic approach for dual core processors has the advantage of lower power consumption and better performance.
I'm just wondering if shared cache a milestone towards the "real" dual core processor, since Intel was the first one implemented in their Core 2 lineups, and now AMD is doing the same thing too,
 

quartzlock

Distinguished
Apr 16, 2006
18
0
18,510
You asked for jack but anyway I'll share my small experience about this as a programmer:
when you are accelerating an app with multithreading and efectively using more than one core, you can divide one single task and let it run on more cores by using diferent threads, imagine you are processing an image and you make two threads each processing one half of the image. The image which I will call the "dataset" is common for both cores, if one thread needs to know something about the other threads half, having a shared cache minimizes the effort because both parts reside on it. This is true ONLY if the data set fits inside the cache, lets say the image is not larger than the cache size sort of speak.

At the other hand, if you are accelerating one task by running different threads that have no common dataset, each thread will use the cache on its benefit but not on the other threads benefit: you write your image to the cache and you simple erased what the other thread maybe could need in the future and here is where you love the separated cache. MCM is ok if your program can efectivly keep the datasets nicely local in the same cache. Cache coherency is the issue in here.

AFAIK, barcelona is a combination of both aproaches, because you have 512k for each core, and 2M for shared cache, and given the fact they are exclusive(no data of L3 is kept in L2 and L1 and viceversas) you have more flexibility keeping thread's local data intact in your L2 and using L3 as the place for "common" data/code.

Returning to the coherency thing, when you have the same dataset on two diferent dies, they are on two diferent caches that need to comunicate to remain syncronized. Imagine you update something in the cache of die 1(core 1,2) and a few cycles later die2(core3,4) try to read the same data from its cache, die1 has to inform die2 that the data has changed, and this generates traffic that could kill the FSB as you move to more cores. In a barcelona system with 8 cores namely 2 quads conected via HT you'll have the same penalty.

After all this semitheory thing, let me tell you that depending on the app running and the size of the datasets you might want shared cache or not at all, where barcelona might have an advantage. But if your dataset is larger than your cache, shared cache simply suxxx. Ive seen it because using assembler instructions that bypass the cache boosts incredibly the speed of dual cores with shared cache, lets ssay you spit outside your own garden :).

Ohh yes, you mentioned gaming.....again, depending on the style of the programmer...

I'm sure there is much more skilled people over here that can explain this much clearer and compact(and correct my mistakes of course) so the word to the experts...

sorry about the typos, not a native english speaker ;)
 

rninneman

Distinguished
Apr 14, 2007
92
0
18,630
You asked for jack but anyway I'll share my small experience about this as a programmer:
when you are accelerating an app with multithreading and efectively using more than one core, you can divide one single task and let it run on more cores by using diferent threads, imagine you are processing an image and you make two threads each processing one half of the image. The image which I will call the "dataset" is common for both cores, if one thread needs to know something about the other threads half, having a shared cache minimizes the effort because both parts reside on it. This is true ONLY if the data set fits inside the cache, lets say the image is not larger than the cache size sort of speak.

At the other hand, if you are accelerating one task by running different threads that have no common dataset, each thread will use the cache on its benefit but not on the other threads benefit: you write your image to the cache and you simple erased what the other thread maybe could need in the future and here is where you love the separated cache. MCM is ok if your program can efectivly keep the datasets nicely local in the same cache. Cache coherency is the issue in here.

AFAIK, barcelona is a combination of both aproaches, because you have 512k for each core, and 2M for shared cache, and given the fact they are exclusive(no data of L3 is kept in L2 and L1 and viceversas) you have more flexibility keeping thread's local data intact in your L2 and using L3 as the place for "common" data/code.

Returning to the coherency thing, when you have the same dataset on two diferent dies, they are on two diferent caches that need to comunicate to remain syncronized. Imagine you update something in the cache of die 1(core 1,2) and a few cycles later die2(core3,4) try to read the same data from its cache, die1 has to inform die2 that the data has changed, and this generates traffic that could kill the FSB as you move to more cores. In a barcelona system with 8 cores namely 2 quads conected via HT you'll have the same penalty.

After all this semitheory thing, let me tell you that depending on the app running and the size of the datasets you might want shared cache or not at all, where barcelona might have an advantage. But if your dataset is larger than your cache, shared cache simply suxxx. Ive seen it because using assembler instructions that bypass the cache boosts incredibly the speed of dual cores with shared cache, lets ssay you spit outside your own garden :).

Ohh yes, you mentioned gaming.....again, depending on the style of the programmer...

I'm sure there is much more skilled people over here that can explain this much clearer and compact(and correct my mistakes of course) so the word to the experts...

sorry about the typos, not a native english speaker ;)

I believe this is cache thrashing you are referring to when one core flushes the data for the other core. Someone feel free to correct me if I'm wrong. I think Intel has some features implemented into their shared L2 to help prevent this in many situations. I think that is part of the technologies that they refer to as "Smart Cache" on the C2D. Again feel free to correct me if I'm wrong.

Ryan
 

Viperabyss

Distinguished
Mar 7, 2006
573
0
18,980
You asked for jack but anyway I'll share my small experience about this as a programmer:
when you are accelerating an app with multithreading and efectively using more than one core, you can divide one single task and let it run on more cores by using diferent threads, imagine you are processing an image and you make two threads each processing one half of the image. The image which I will call the "dataset" is common for both cores, if one thread needs to know something about the other threads half, having a shared cache minimizes the effort because both parts reside on it. This is true ONLY if the data set fits inside the cache, lets say the image is not larger than the cache size sort of speak.

At the other hand, if you are accelerating one task by running different threads that have no common dataset, each thread will use the cache on its benefit but not on the other threads benefit: you write your image to the cache and you simple erased what the other thread maybe could need in the future and here is where you love the separated cache. MCM is ok if your program can efectivly keep the datasets nicely local in the same cache. Cache coherency is the issue in here.

AFAIK, barcelona is a combination of both aproaches, because you have 512k for each core, and 2M for shared cache, and given the fact they are exclusive(no data of L3 is kept in L2 and L1 and viceversas) you have more flexibility keeping thread's local data intact in your L2 and using L3 as the place for "common" data/code.

Returning to the coherency thing, when you have the same dataset on two diferent dies, they are on two diferent caches that need to comunicate to remain syncronized. Imagine you update something in the cache of die 1(core 1,2) and a few cycles later die2(core3,4) try to read the same data from its cache, die1 has to inform die2 that the data has changed, and this generates traffic that could kill the FSB as you move to more cores. In a barcelona system with 8 cores namely 2 quads conected via HT you'll have the same penalty.

After all this semitheory thing, let me tell you that depending on the app running and the size of the datasets you might want shared cache or not at all, where barcelona might have an advantage. But if your dataset is larger than your cache, shared cache simply suxxx. Ive seen it because using assembler instructions that bypass the cache boosts incredibly the speed of dual cores with shared cache, lets ssay you spit outside your own garden :).

Ohh yes, you mentioned gaming.....again, depending on the style of the programmer...

I'm sure there is much more skilled people over here that can explain this much clearer and compact(and correct my mistakes of course) so the word to the experts...

sorry about the typos, not a native english speaker ;)
interesting read. I'll digest it a little bit before i respond to you.