Intel's Knights Corner: 50+ Core 22nm Co-processor

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
if you know C.U.D.A. or OpenCL then you already know that you have a co-processor in you system, processing more than 1 TFLOP's
 
Ok, so start out with the top shelf Xeon processor, or two of them. Add a ton of ram. Throw one of these in a pcie slot, and fill the rest with 6990s... = conquer any workload, bit-coining, and win fold@home? :)
 
[citation][nom]nottheking[/nom]If it's 8 cores, then each core has FPU power that badly embarasses Sandy Bridge... A single core's FPU capabilities of Sandy Bridge only allow for 128 bits of FPU data (either 1x128-bit x87, 4x32-bit SSE, or 2x64-bit AVX) per clock, while each Bulldozer module allows for double that: 2x128-bit under x87, and either 8x32-bit or 4x64-bit using AVX.[/citation]

The FPU performance per core (and your statement may or may not be true as pointed out by g00ey) is not directly relevant to whether or not a module can be considered as two cores.

What it comes down to is to evaluate the best way to define the word core in the context of CPUs. The best way to do this is based on examining the most general definition of a core and then applying that understanding to what it means in the relation to CPUs.

In its most fundamental definition core means "the central, innermost, or most essential part of anything". In the Bulldozer architecture the most central item is the module. You cannot disable half of a module and still have a functioning core as they share too many resources.

I basically see Bulldozers modules similar to the idea of hyper-threading. They duplicate a portion of each core to speed up certain operation. It is done so differently so it has different performance implications than hyper-threading, but they do not fully duplicate each core. Thus, it makes more sense to define each module as one core with two threads.

AMD chose to define a core as anything with its own integer pipeline. They did so to be able to advertise Bulldozer as an 8 core CPU. This is somewhat of an exaggeration, because it is most accurately described as a 4 core CPU with 8 integer execution pipelines.
 
[citation][nom]otacon72[/nom]Let's see AMD compare their 16 core chip to that one. I mean they compared their 16 core CPU to the Xeon with 6 cores. Seems fair.[/citation]
Actually they compared a 16 core cpu that is 2 times less expensive to a 6 core intel. Does the price count ? YES, do you get better performance for less money ? Yes, should you care if it has 16 and not 6 cores ? NO!
 
I resent any processor that is not made to benchmark Crysis.

I am telling you, there will be some stress testers or a benchmark company that will plug this little monster on a pci-express just to check the physical laws boost on a 3d application.
There will be companies who will just keep their Xeon and plug this one to boost performance, although this is a co processor=just math? Ye its a lab "rat" but it does show the future of the PCs if you think about it. 24 core CPU on 1-2Ghz in few years will be possible i guess
 


NV's top GPU does 515 peak GFLOPs: http://www.nvidia.com/object/personal-supercomputing.html

IIRC, sustained throughput is about 2/3rds of peak. Since the Knight's Corner does over 1 TFLOP sustained (according to the slide anyway), that would be 3X NV's best GPU on double precision..

 
"Such power of Intel's MIC (many integrated core) architecture won't be used to play Crysis..."

SAYS WHO?

I plan on running 10 INSTANCES of Crysis at the SAME TIME on that CPU!
 


Granted that a 256-bit wide FPU would be better than using a 128-bit FP plus 128-bit integer, esp. if doing a lot of FP since that means fewer resources for integer available, but my guess is that Intel did lots of code analysis (they do have a lot of software dev support, far more than what AMD does, and it's a two-way street - the devs get their code tuned for Intel CPUs and Intel gains lots of insight on current code, which helps with their best-in-class OoO & branch prediction). So maybe not as severe a handicap as one might initially think.
 
[citation][nom]mayankleoboy1[/nom]this chip does 1tflop of double precision floating point arithmetic.[/citation]

This chip does nothing. It's a paper launch and will not be on the market in any product for at least a year or more. In the mean time both nVidia and AMD will release graphics card that can do 2Tflop+ double precision direct compute, and you know what? They'll still work as graphics cards unlike this piece of junk.
 
[citation][nom]nottheking[/nom]Yep, this looks like the current incarnation of Larrabee. Bets are open on whether this will actually make it to market; Intel's trying to push for outright dominance in a field they've been outside of the whole time, while nVidia and AMD have been working with years and years of experience under their respective belts.Since this is an external co-processor run through PCI-express, this makes it not different at all from nVidia or AMD's GPGPU solutions... And that means Intel's badly beat with only 50 cores. Even assuming that the 1 TFLOP number is actually double-precision, that makes this still-not-officially-benchmarked-or-released chip only perhaps in the same LEAGUE as existing, in-market stuff out now. How does Intel expect this to compete when AMD *already* has a 676 GigaFLOP (over 2/3 the power) card available, RIGHT NOW, for under $400US? By the time Knight's Corner could release, AMD will have their Tahiti 7970 out, which likely will rock in at 2+ TeraFLOPs for a single GPU, at the same sub-$400US price point.
If it's 8 cores, then each core has FPU power that badly embarasses Sandy Bridge... A single core's FPU capabilities of Sandy Bridge only allow for 128 bits of FPU data (either 1x128-bit x87, 4x32-bit SSE, or 2x64-bit AVX) per clock, while each Bulldozer module allows for double that: 2x128-bit under x87, and either 8x32-bit or 4x64-bit using AVX.[/citation]
Its amazing how little people know about the HPC market. This is driven by the HPC comunity. What is the most important feature is the code compatibility to the existing x86 platform. that indicates that the HPC community does not need to use a seporate develoopment envionment in order to build something. I was talking to some of the large HPC customers that they were very exited about this card.
 
[citation][nom]darthvidor[/nom]I wonder when will "Co" get separated from the Processor and get into motherboard sockets other than pci-express cards.[/citation]

Back in the day, they were separate, now they are combined, but if they become as big as the CPU or bigger, then 1. It’s time for socket LGA 3011 or something else
 
[citation][nom]iceman1992[/nom]If it's a co-processor, what's the other processor? An i7?[/citation]
A Xeon, this is server stuff, they don't use Cores, they use Xeons.
 
[citation][nom]oparadoxical_[/nom]Makes me wonder just what we will have in ten years from now... Especially for personal computers.[/citation]
...or 12 or 8 years, or what a cell phone will be like in 9.3 years, or the weather in 3 weeks, or my nagging knee pain in 16 months...
 
[citation][nom]otacon72[/nom]Let's see AMD compare their 16 core chip to that one. I mean they compared their 16 core CPU to the Xeon with 6 cores. Seems fair.[/citation]
This is a coprocessor, not a CPU. Please don't mix both.
 
[citation][nom]nottheking[/nom]... it just kind of emerged as an alternative to "extra CPU" around 2004 when the Pentium D and Athlon64 X2 came out... And there was bickering there, too. Yet no one deemed that a chip failed to be "dual-core," if it merged or removed redundant parts, such as the memory interface or I/O, or shared a pool of cache. Looking at that philosophy, Bulldozer's modules are the next logical step in the evolution of sets of 2 cores.[/citation]
I seriously don't understand why you bring up the Pentium D and the old X2, I cannot recall any "bickering" about their core count and while the Pentium D had 4 ALUs (2 per core), everyone still agreed that it was a dual core CPU.
 
+1 I resent any processor that cannot benchmark Crysis ! Heck maybe crytek will be bribed to recode crysis for this such setup and windows 8 in the near future for hundreds of cores..
 
[citation][nom]Stardude82[/nom]What about GPU processing? Isn't that what CUDA is for? After all, my 460 GTX has 336 cores.[/citation]
its basically an alternative to using cuda or opencl with a dedicated gpu. it doesnt need a proprietry language like cuda though.
 
[citation][nom]kronos_cornelius[/nom]I got my money on Cuda clusters. All the applications they mention as being highly parallel benefit from CUDA, so why get 50 cores when you can get 512[/citation]
Because the individual cores are in no way directly comparable, in terms of either architecture or performance.
 
[citation][nom]fazers_on_stun[/nom]NV's top GPU does 515 peak GFLOPs: http://www.nvidia.com/object/perso [...] uting.htmlIIRC, sustained throughput is about 2/3rds of peak. Since the Knight's Corner does over 1 TFLOP sustained (according to the slide anyway), that would be 3X NV's best GPU on double precision..[/citation]

NVIDIA has card based on 512 CUDA cores (M2090), which does 665 GFLOPS in DP. It's listed under "Tesla Data Center" category. All these numbers are posted by NVIDIA, we don't know if they are theoretical peak or sustained. If Intel's number is true, 1 TFLOPS in DP sustained is surely very impressive. People have to realize that unlike cores found in GPUs today, these Intel cores are compatible to x86 ISA, so it programs just like normal x86 CPU, and it can operate on its own. With GPU, you need to have CPU accompanies it. But again without further information, we don't know the price and power consumption of this chip compare to GPU solutions.
 


Exactly, I thought people should know it by now. Just like you don't compare performance solely on number of AMD's Stream Processor to NVIDIA's CUDA core. In terms of individual size and complexity, CUDA core is bigger and more complex than AMD's SP, and these Intel cores are surely bigger and more complex than CUDA cores.
 
[citation][nom]otacon72[/nom]Let's see AMD compare their 16 core chip to that one. I mean they compared their 16 core CPU to the Xeon with 6 cores. Seems fair.[/citation]
It is specifically stated in the article that Knights Corner is not a CPU, rather, it is a mathematical co-processor. Once these hit the market, we will see, for sure, its place in computer architecture. A more correct AMD comparison, in my opinion, would be against workstation GPUs or consumer GPUs used for high-performance mathematical computing.

The architecture of this thing is highly parallel. Even if its primary use were as a CPU, you would still have all the problems associated with programs that are not highly threaded. It only derives its intense computational power from the fact that it can do many operations in parallel - just like GPUs.

I suggest maintaining an awareness of the almost verbatim marketing blather repeated in this, IMHO, somewhat dumbed down version of the release. In case anyone is interested, this is a somewhat more informative article which clearly states that the chip is not intended as a CPU.
 

Exactly. And each has their own niche where they out perform the other. As a BOINC participant running MilkyWay@Home on an Nvida GPU, I have noticed that for MW@H, in general, AMD GPUs are about 8-times faster than Nvidia. In other BOINC projects, the performance is reversed between the chips from the different manufacturers. It is not a factor of newer generation cards from a specific manufacturer, either. The comparison I made is between an 8800 GT, and AMD in general; however, I also have a GTX 460 which is only about 25-percent faster for MW@H than the 8800 GT.

 
Status
Not open for further replies.