THG (P)reviews "Core 2 Quadro" - aka Kentsfield!

The_Abyss · Sep 12, 2006

It is however inevitable that the drive forward of X Box 360 (which is multi-threaded) and the simultaneous release of the same titles on PC will result in the PC versions also being multi-threaded. Whilst different, they are not worlds apart, unlike the horrible complexity of cell and PS3.

clue69less · Sep 12, 2006

Of course, I'm sure I'll get flamed for this, but I'm truly shocked that folks reamed Intel for gluing two P4's together for the Pentium D series and then they laud them for effectively doing the same thing with C2D to get to C2Q.

It’s the difference between gluing two turds together and gluing two chocolate muffins together. The first you put in the bin, the second in your mouth.

I'm never coming to your house for dinner.

Regardless of the image it conjures up on one's head, he does have a good point.

Sure he does. But anyone that glues turds together, or even talks about doing so - I don't want that person baking the muffins I eat. Once he gets into gluing turds, the next thing you know, he'll be trying to polish them.

crow_smiling · Sep 12, 2006

But anyone that glues turds together, or even talks about doing so - I don't want that person baking the muffins I eat. Once he gets into gluing turds, the next thing you know, he'll be trying to polish them.

😀
No, that’s Intel’s job.

The_Abyss · Sep 12, 2006

But anyone that glues turds together, or even talks about doing so - I don't want that person baking the muffins I eat. Once he gets into gluing turds, the next thing you know, he'll be trying to polish them.

😀
No, that’s Intel’s job.

Speaking generically, I thought BM had taken on that job for himself?

clue69less · Sep 12, 2006

But anyone that glues turds together, or even talks about doing so - I don't want that person baking the muffins I eat. Once he gets into gluing turds, the next thing you know, he'll be trying to polish them.

😀
No, that’s Intel’s job.

Speaking generically, I thought BM had taken on that job for himself?

Peter principle in action, yes indeed...

shinigamiX · Sep 13, 2006

Peter Principle Picked a Pot of Central Processing Units? Oh, haha, I kill myself sometimes.

SuperFly03 · Sep 13, 2006

Peter Principle Picked a Pot of Central Processing Units? Oh, haha, I kill myself sometimes.

Good then I won't have to deal with you anymore..... OOOOOOOOOOOOO, just kidding man

I need to eat.. anyone for Peter Piper Pizza?

joset · Sep 13, 2006

Any opinions on whether Kentsfield is a 1067MHz or 1333MHz part from my earlier post?

As far as I've gone through that issue, it seems that Kentsfield will be a 1333MHz FSB part, while Clovertown (Xeon 53xx series) will be a 1066MHz one; cannot be certain though, since the sources do not appear to be that reliable.

Intel Bearlake DDR3 supports, next year.
Bearlake, Bearlake G Bearlake P next year, support DDR2 and Bearlake X Bearlake G+ DDR3 supports, DDR3-1333. and DDR3, Bearlake possible supports DDR2 DDR3, 915 supports DDR DDR2.
DDR3, Bearlake X (Bearlake Q) 1333MHz support profits, support PCI-E 2.0, PCI-E x16 cores and ICH9 under (ICH9, ICH9R, ICH9DO, ICH9DH) and.

(Translated from: http://digi.it.sohu.com/20060817/n244848891.shtml)

and,

http://www.bit-tech.net/news/2006/08/18/A_look_at_intels_upcoming_moves/

It also appears that i965 Express (Broadwater) will only support a 1066MHz FSB.

Being true - and as you stated - we're left with a DT 1333MHz FSB part running either on an outdated i975X (perhaps a future chipset revision?) 1333MHz FSB "capable" or the i965 Express, 1066MHz FSB limited chipset while, on the server side, it'll be Clovertown itself limited to a 1066MHz FSB, the Bearlake chipset being capable of 1333MHz FSB.
It'll probably make sense since, in practice, there'll be no significant performance difference, in both cases (DT & server); and, while there'll certainly be non-Intel 1333MHz FSB chipsets, on the DT space, server-wise, Bearlake will be prepared for upcoming Intel chips, with a 1:1 FSB ratio.

Cheers!

joset · Sep 13, 2006

(...) It's only now that Intel finally had to take a step back that they took the time to widen everything, which they seemed to do excessively although bottlenecks still do exist like the 16-byte predecode width.

Interestingly, there's some discussion going on at RWT (http://www.realworldtech.com/forums/index.cfm?action=detail&id=72430&threadid=72410&roomid=11), on the supposition that NetBurst already pocessed an incipient "memory disambiguation" process.
The author of such statement is Hans de Vries (from www.chip-architect.com) and he quotes an Intel paper on the P4 architecture:

Additionally, we changed the mechanism used to schedule load uops to improve performance. As on prior implementations, store instructions are broken up into two pieces: a store address and a store data uop. In the previous implementations, loads were scheduled asynchronously to store data uops. Thus, if a load needed to receive forwarded data from a store, it was possible that the load would execute before the store data uop. If this occurred, the load would have to be reexecuted after the store data uop had finally executed. Because of this, latency could be introduced because the minimum latency between a store data uop and a dependent load was not the common case latency for loads that had been re-executed. On top of that penalty, having to re-execute the load meant that precious load bandwidth was being wasted on loads that executed more than once. To alleviate both of these issues, we added a simple predictor to the processor that marks whether specific load uops are likely to receive forwarded data, and, if so, from which store they are likely to forward. Given this information, the load scheduler now holds a load that is predicted to forward in the scheduler until the store data uop that produces the data it depends on is scheduled. In doing so, both of these performance penalties are reduced significantly.

ftp://download.intel.com/technology...01/art01_microarchitecture/vol8iss1_art01.pdf (page 7, right column).

Haven't got the time to go through the paper but, reading the above, I'm left with mixed feelings; I believe de Vries has a good argument (see link) in which case, would prove [Intel] NetBurst uArch as a sort of testing platform (well, they all are, in ways...), together with Pentium M; on the other hand, I cannot infer any kind of memory disambiguation from what's quoted above, at least not as Core has it defined & implemented.
Off topic but interesting, nevertheless. I'm not aware of such a feature in the Pentium M; hence, my point on the "combination" of two opposite microarchitectures: NetBurst & P-M.

Cheers!

ltcommander_data · Sep 13, 2006

I'm not fully versed in memory disambiguation, but there have been forms of load before store abilities in pre-Core 2 processors. Hans de Vries mentions the ability in Netburst and the P6 architecture contains this ability as well.

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2748&p=5

The P6 and P-M could already reorder Loads pretty good. They could move one Load before other Loads, as well as before Stores which have no unknown addresses or addresses which do not reference the same address as the load.

Of course, the Netburst and P6 load before store reorder ability appears more limited in scope and only occur if the address is known. Of course, memory disambiguation is supposed to be the prediction of whether load and store share the same address although marketing has spread out the term to encompass the entire load before store ability.

In regards to the FSB issue, it seems counter-productive to allow Kentsfield to have a 1333MHz FSB when it'll has unofficial motherboard support, while Cloverton is limited to a 1067MHz FSB when it's motherboards are all designed for 1333MHz operation.

http://www.anandtech.com/mac/showdoc.aspx?i=2832&p=6

Anandtech has recently plopped 2 Clovertons into a Mac Pro. They use a 1067MHz FSB and are clocked at 2.4GHz. They appear to work although aren't completely stable, but that may not be as much the chips as also a combination with lack of proper BIOS and OS support. It is interesting though that Kentsfield samples are up to 2.67GHz while I've only seen 2.4GHz Cloverton samples. Anandtech couldn't show any Cloverton tests though since they are still under NDA. THG was obviously given special permission by Intel, which was probably an attempt on Intel's part to get some good publicity after the poor reaction to those job cuts.

clue69less · Sep 13, 2006

THG was obviously given special permission by Intel, which was probably an attempt on Intel's part to get some good publicity after the poor reaction to those job cuts.

If that's the impetus, then I think the web is an excessively tangled weave. The 90s convinced all of the bigs to make renewal a way of life. Whether or not the studies leading to those conclusions were valid is beside the point - for the forseeable future, it is a way of business life. What's the chance that a review on Tom's could counter bad feelings about job cuts?

ltcommander_data · Sep 13, 2006

What's the chance that a review on Tom's could counter bad feelings about job cuts?

I wasn't really saying it about the employees themselves, but more the stock market reaction. Investors reacted negatively to the job cuts which were supposed to save costs, so Intel is probably trying to release some Kentsfield data to boost optimism again. Sadly, good Intel performance in the future probably wouldn't do much to make people who were just let go feel any better.

clue69less · Sep 13, 2006

What's the chance that a review on Tom's could counter bad feelings about job cuts?

I wasn't really saying it about the employees themselves, but more the stock market reaction. Investors reacted negatively to the job cuts which were supposed to save costs, so Intel is probably trying to release some Kentsfield data to boost optimism again. Sadly, good Intel performance in the future probably wouldn't do much to make people who were just let go feel any better.

I didn't mean to imply bad feelings from employees. I've had good friends that got the axe, have seen families suffer and am used to seeing investors respond positively. It's proof of a lean and mean attitude, still hungry for success, sis-boom-bah! I never really bought into the whole belief system behind workforce reduction by formula. Fear can have short term effects that can be interpreted in a positive way, but it's just too short-sighted.

Then again, there are a bunch of very wealthy people that got there by stepping on the little guy.

gOJDO · Sep 13, 2006

SSE was roughly equivalent to 3dNow! in performance.

3D Now! is not equivalent to SSE. It is much more closer to MMX than it is to SSE.
Sorry, but you're wrong.
MMX is a SIMD instruction set for processing vector integer code, 3dNow! instead was aimed at FP vector code.
SSE was Intel's response to 3dNow!, back then AMD wasn't really able to push its technology to become an industry standard, like instead it happened recently with AMD64.
Back in those days, i wrote a technical article (for an Italian tech website) on a comparison of 3dNow!, SSE and Altivec (the streaming SIMD set of PowerPC), which was later referenced by Jon - Hannibal - Stokes of Ars Technica.
3DNow!(, Enhanced 3DNow!, 3DNow! Professional, SSE, SSE2, SSE3 and SSSE3) are extensions to the MMX. Its purpose, like the purpose of the MMX, is to improve the performance of 3D games and multimedia. Becouse of the weak FPU of K6, 3DNow! was coverup in the competition with P2. It was expanded latter with Enhanced 3D Now! and 3D Now! Professional on the K6-III & K7, but never did its goal becouse it was software unsupported. The SSE came as a response from Intel to the 3D Now! It was much more advanced and faster. 3D Now! is not even close to equivalent of SSE.

3D Now! provides 21 vector instructions(integer & FP), that operate on 64-bit registers, divided into two 32-bit single-precision FP words, supporting only the round-to-nearest rounding mode.
K6 implement 8 64bit 3D Now! registers, mapped onto the FP registers just like MMX registers. Aliasing the 3D Now! registers onto the floating-point stack enables to write x86 programs containing both integer, MMX and SIMD FP instructions with no performance penalty(100-150 cycles on P-MMX) for switching between the integer MMX and the floating-point 3D Now! units.

SSE provides 8 cache control instructions and 70 vector instructions (integer & FP), that operate on 128-bit registers, divided into four 32-bit single precision FP words, supporting all 4 rounding modes by the IEEE stanard.
P3 (Katmai) impelment 8 128bit SSE registers, mapped onto the FP registers just like MMX registers.
The eight 64bit MMX registers are aliased on top of the eight FPU registers, enabling SIMD integer oprations in parallel with SSE(no penalty for switching).

So, 3D Now! and SSE are not equal, nither their instructions are compatible.

Pippero · Sep 13, 2006

3DNow!(, Enhanced 3DNow!, 3DNow! Professional, SSE, SSE2, SSE3 and SSSE3) are extensions to the MMX. Its purpose, like the purpose of the MMX, is to improve the performance of 3D games and multimedia. Becouse of the weak FPU of K6, 3DNow! was coverup in the competition with P2. It was expanded latter with Enhanced 3D Now! and 3D Now! Professional on the K6-III & K7, but never did its goal becouse it was software unsupported. The SSE came as a response from Intel to the 3D Now! It was much more advanced and faster. 3D Now! is not even close to equivalent of SSE.

3D Now! provides 21 vector instructions(integer & FP), that operate on 64-bit registers, divided into two 32-bit single-precision FP words, supporting only the round-to-nearest rounding mode.
K6 implement 8 64bit 3D Now! registers, mapped onto the FP registers just like MMX registers. Aliasing the 3D Now! registers onto the floating-point stack enables to write x86 programs containing both integer, MMX and SIMD FP instructions with no performance penalty(100-150 cycles on P-MMX) for switching between the integer MMX and the floating-point 3D Now! units.

SSE provides 8 cache control instructions and 70 vector instructions (integer & FP), that operate on 128-bit registers, divided into four 32-bit single precision FP words, supporting all 4 rounding modes by the IEEE stanard.
P3 (Katmai) impelment 8 128bit SSE registers, mapped onto the FP registers just like MMX registers.
The eight 64bit MMX registers are aliased on top of the eight FPU registers, enabling SIMD integer oprations in parallel with SSE(no penalty for switching).

So, 3D Now! and SSE are not equal, nither their instructions are compatible.

Please, take a look at this old article from Ars Technica which explains the situation very well.
Please also look at the bibliography section at the end of the article, where it says "Walter NisticÚ", that's me (actually the name is Nisticò), and unfortunately my article is not online anymore (we're talking about 6 years and the website was shut down).
Also note that i never said that 3dNow! and SSE were compatible, but that yes they were roughly equivalent.
3dNow! was as i said a FP extension to MMX, SSE had more instructions, but this because most of them were just copies of the original MMX instructions made to work on the new registers.
Yes SSE is 128 bit and 3dnow is 64, but 3dnow could execute 2x 64bit instructions per clock while SSE 1x 128bit, and while you could pipeline 2 same 3dnow instruction one clock after the other, you couldn't do the same with SSE (until now, with Core 2) because internally SSE was also 64bit.
There were several other elements to the table, but in the end the statements i made were true, and i'll refrain:
saying that

3D Now! is not equivalent to SSE. It is much more closer to MMX than it is to SSE.

is simply incorrect.
How can it be 3dNow! "closer" to MMX?
3dNow! is complementary to MMX, not analogous as you say!
We should rather say, 3dNow!+MMX is roughly equivalent to SSE (and so it is in performance, with some small advantages for one or the other application dependent).

gOJDO · Sep 13, 2006

3D Now! is not only only FP extension to MMX, but extension to the integer instructions also(the same is with SSE).
When 3D games and multimedia became mainstream, the need for FP raised. Thats why Intel invented MMX, AMD 3DNow! and so on.
There were a lot of games and apps(video & audio) that were not running on CPUs without SSE. The "3D Now! equivalent" didn't run those softwares requiring SSE. On the other side, there is no software that uses 3D Now! and can't run on SSE. Thats why I don't consider 3D Now! as a equivalent of SSE.

BTW, we are a bit OT. That was long time ago when 3D Now!/SSE were actual

Anyway I'll check your articles on ars-techinca. Seems you know your stuff

Cheers

SuperFly03 · Sep 13, 2006

As I sit here starring dumbly at the screen not understanding a word your saying... I guess I need to add that to my to do list. Damn that list is getting long, have alot of IT realted crap to read. Why can't we go MIB style and have 36 hour days?

Good respectible discussion though. My question is wth is with SSE/SSE2/SSE3/SSE4? Are they all the same or are they improvements on each previous generation with just more instruction sets? Any good articles on FP vs. Integer computations? I assume if we stick to mathmatical terms we are talking the diffrence between 2.332342523 and 2, but I've been wrong before :?

clue69less · Sep 13, 2006

As I sit here starring dumbly at the screen not understanding a word your saying... I guess I need to add that to my to do list. Damn that list is getting long, have alot of IT realted crap to read. Why can't we go MIB style and have 36 hour days?

Hey, then AMD might have a chance of getting K8L out soon?

OggietheApe · Sep 13, 2006

Ok, so you're sayin that there is not that much of a difference between sticking two conroe duo chips on a processor or having all four of them working as one in terms of speed?

Also, I'm not good at overclocking. Makes me too nervous. Know any company that would overclock a processor for me?

SuperFly03 · Sep 13, 2006

As I sit here starring dumbly at the screen not understanding a word your saying... I guess I need to add that to my to do list. Damn that list is getting long, have alot of IT realted crap to read. Why can't we go MIB style and have 36 hour days?

Hey, then AMD might have a chance of getting K8L out soon?

Good point, that also means that we might see 8 cores in the spring 8O

-Oggie

I don't know of any company that will OC your chip for you but there are a great many guides to OC'ing chips. As long as you exercise reasonable care you most likely won't break anything. I have broken 2 mobo's in the past 9 months OC'ing but that crap happens, that is why there are warranties.

If you don't just throw too much voltage at a CPU it will lock up before you can fry it with heat. Mobo's are generally stable provided you have moderate air circulation. RAM acts alot like a CPU in terms of OC'ing, it will lock up and crash before it fries.

I just started OC'ing this past december and I am getting kinda good at it (not that good, but at least I am knowledgable now). Just read up and watch these forums, people here are generally very helpful if you pose a decent question. Ifyou post a problem like "Help it won't boot" of course you won't get much help, but if you describe what happened there are plenty of people to help.

Pippero · Sep 13, 2006

Well, yeah, as a matter of fact, 3dNow! was not commercially successful, hence all the software ended up using SSE (and, at best 3dNow!).
Enhanced 3dNow added a few functionalities which SSE had and were missing in the original AMD set (if i'm not mistaken, instructions used to accelerate video encoding/decoding and additional prefetching modes); but then AMD introduced 3dNow! Pro (which was just a reverse engineered SSE) and it was the end of the story.
What i originally meant is that 3dNow! and SSE were functionally equivalent (and back then, also kind of perfomance equivalent), but overall i agree that SSE was a better designed (and future proof) instruction set (for example, if i remember correctly, the Pentium 3 was only doing round-to-nearest, and the other modes were just "hints" which could be adopted in future CPUs).
The reason for this is that AMD didn't have a market position strong enough to make important changes: for example, Intel introduced a new processor mode to enable the additional registers of SSE, and this required kernel support (OS), while 3dNow! would work out of the box, being designed as a kind of FP extension to MMX; SSE instead made "de facto" MMX redundant.
What i also meant is that AMD went a long way since then, being finally able to push AMD64 as an industry standard and having Intel reverse engineering it.
But you're right, we are absolutely off topic now

so apologies to everybody for polluting the thread.
Cheers

Pippero · Sep 13, 2006

Good respectible discussion though. My question is wth is with SSE/SSE2/SSE3/SSE4? Are they all the same or are they improvements on each previous generation with just more instruction sets?

They are extensions with new instructions each.
The most significant one, IMO, is SSE2, which was designed as a true replacement for the incredibly crappy x87 instruction set, and helped a lot PC CPUs to narrow the gap with RISC processors in FP code.
SSE2 introduced 64bit operation precision, which is absolutely required for most "serious" applications (3dNow! / SSE1 were just aimed at gaming/multimedia), and over x87 it offered a finally flat register file (x87 code used an archaic and performance crippling stack mode for accessing its meager 8 registers) and an overall increased register space (8x 128bit registers instead of 8x 64bit ones), which is however still not that great, compared with most RISCs which have at least 32x 64bit registers (AMD64 however has expanded this to 16x 128bit, when operating in 64bit mode).
EDIT: and unfortunately, SSEx compared to RISCs is also missing 3-operand operations.
Hence, Intel intentionally crippled the x87 FPU of PentiumIV (which was slower than the P!!! one, and waaaaay slower compared to the Athlon) to force SW developers to use SSE2 instead, and i think this was a wise decision, even though they paid it dearly in terms of image in the beginning .
Concerning SSE 3 & 4, i dont know the details, but i think they are very specific instructions with a limited scope of application.

clue69less · Sep 13, 2006

I don't know of any company that will OC your chip for you (snip)

Hey! Great new e-biz opportunity. We send experienced overclockers to YOUR HOME and overclock your PC. The bad part is that most of the people that might hire us are guys.

Whizzard9992 · Sep 13, 2006

SSE3 and SSSE3 (formerly SSE4) have a small write-up on wikipedia. They appear to be common arithmetic opcodes that perform functions that used to take multiple opcodes, such as

pmaddubsw : Packed Multiply-ADD with conversion from Unsigned Byte to Signed Word

Whizzard9992 · Sep 13, 2006

Bad joke, but:

I hear Intel's introducing a new instruction for the EE set of processors, which includes:

DMPFMWIAW: Download my porn for me while I'm at work

CSWMBWB: Close solitaire when my boss walks by

Last but not least....

CTHSGECTNI: Correct Tom's hardware spelling and Grammar errors 'cuz they need it.

Things that used to take forever can now be completed in a single instruction cycle. Eat that AMD!

THG (P)reviews "Core 2 Quadro" - aka Kentsfield!

Distinguished

Splendid

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Share this page