Intel's Knights Corner: 50+ Core 22nm Co-processor

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
G

Guest

Guest
if you know C.U.D.A. or OpenCL then you already know that you have a co-processor in you system, processing more than 1 TFLOP's
 
Ok, so start out with the top shelf Xeon processor, or two of them. Add a ton of ram. Throw one of these in a pcie slot, and fill the rest with 6990s... = conquer any workload, bit-coining, and win fold@home? :)
 

soccerdocks

Distinguished
May 24, 2011
175
0
18,710
[citation][nom]nottheking[/nom]If it's 8 cores, then each core has FPU power that badly embarasses Sandy Bridge... A single core's FPU capabilities of Sandy Bridge only allow for 128 bits of FPU data (either 1x128-bit x87, 4x32-bit SSE, or 2x64-bit AVX) per clock, while each Bulldozer module allows for double that: 2x128-bit under x87, and either 8x32-bit or 4x64-bit using AVX.[/citation]

The FPU performance per core (and your statement may or may not be true as pointed out by g00ey) is not directly relevant to whether or not a module can be considered as two cores.

What it comes down to is to evaluate the best way to define the word core in the context of CPUs. The best way to do this is based on examining the most general definition of a core and then applying that understanding to what it means in the relation to CPUs.

In its most fundamental definition core means "the central, innermost, or most essential part of anything". In the Bulldozer architecture the most central item is the module. You cannot disable half of a module and still have a functioning core as they share too many resources.

I basically see Bulldozers modules similar to the idea of hyper-threading. They duplicate a portion of each core to speed up certain operation. It is done so differently so it has different performance implications than hyper-threading, but they do not fully duplicate each core. Thus, it makes more sense to define each module as one core with two threads.

AMD chose to define a core as anything with its own integer pipeline. They did so to be able to advertise Bulldozer as an 8 core CPU. This is somewhat of an exaggeration, because it is most accurately described as a 4 core CPU with 8 integer execution pipelines.
 

ohim

Distinguished
Feb 10, 2009
1,195
0
19,360
[citation][nom]otacon72[/nom]Let's see AMD compare their 16 core chip to that one. I mean they compared their 16 core CPU to the Xeon with 6 cores. Seems fair.[/citation]
Actually they compared a 16 core cpu that is 2 times less expensive to a 6 core intel. Does the price count ? YES, do you get better performance for less money ? Yes, should you care if it has 16 and not 6 cores ? NO!
 

beetlejuicegr

Distinguished
Jan 10, 2011
350
1
18,815
I resent any processor that is not made to benchmark Crysis.

I am telling you, there will be some stress testers or a benchmark company that will plug this little monster on a pci-express just to check the physical laws boost on a 3d application.
There will be companies who will just keep their Xeon and plug this one to boost performance, although this is a co processor=just math? Ye its a lab "rat" but it does show the future of the PCs if you think about it. 24 core CPU on 1-2Ghz in few years will be possible i guess
 


NV's top GPU does 515 peak GFLOPs: http://www.nvidia.com/object/personal-supercomputing.html

IIRC, sustained throughput is about 2/3rds of peak. Since the Knight's Corner does over 1 TFLOP sustained (according to the slide anyway), that would be 3X NV's best GPU on double precision..

 

Netherscourge

Distinguished
May 26, 2009
390
0
18,780
"Such power of Intel's MIC (many integrated core) architecture won't be used to play Crysis..."

SAYS WHO?

I plan on running 10 INSTANCES of Crysis at the SAME TIME on that CPU!
 


Granted that a 256-bit wide FPU would be better than using a 128-bit FP plus 128-bit integer, esp. if doing a lot of FP since that means fewer resources for integer available, but my guess is that Intel did lots of code analysis (they do have a lot of software dev support, far more than what AMD does, and it's a two-way street - the devs get their code tuned for Intel CPUs and Intel gains lots of insight on current code, which helps with their best-in-class OoO & branch prediction). So maybe not as severe a handicap as one might initially think.
 

saturnus

Distinguished
Aug 30, 2010
212
0
18,680
[citation][nom]mayankleoboy1[/nom]this chip does 1tflop of double precision floating point arithmetic.[/citation]

This chip does nothing. It's a paper launch and will not be on the market in any product for at least a year or more. In the mean time both nVidia and AMD will release graphics card that can do 2Tflop+ double precision direct compute, and you know what? They'll still work as graphics cards unlike this piece of junk.
 

Area51

Distinguished
Jul 16, 2008
95
0
18,630
[citation][nom]nottheking[/nom]Yep, this looks like the current incarnation of Larrabee. Bets are open on whether this will actually make it to market; Intel's trying to push for outright dominance in a field they've been outside of the whole time, while nVidia and AMD have been working with years and years of experience under their respective belts.Since this is an external co-processor run through PCI-express, this makes it not different at all from nVidia or AMD's GPGPU solutions... And that means Intel's badly beat with only 50 cores. Even assuming that the 1 TFLOP number is actually double-precision, that makes this still-not-officially-benchmarked-or-released chip only perhaps in the same LEAGUE as existing, in-market stuff out now. How does Intel expect this to compete when AMD *already* has a 676 GigaFLOP (over 2/3 the power) card available, RIGHT NOW, for under $400US? By the time Knight's Corner could release, AMD will have their Tahiti 7970 out, which likely will rock in at 2+ TeraFLOPs for a single GPU, at the same sub-$400US price point.
If it's 8 cores, then each core has FPU power that badly embarasses Sandy Bridge... A single core's FPU capabilities of Sandy Bridge only allow for 128 bits of FPU data (either 1x128-bit x87, 4x32-bit SSE, or 2x64-bit AVX) per clock, while each Bulldozer module allows for double that: 2x128-bit under x87, and either 8x32-bit or 4x64-bit using AVX.[/citation]
Its amazing how little people know about the HPC market. This is driven by the HPC comunity. What is the most important feature is the code compatibility to the existing x86 platform. that indicates that the HPC community does not need to use a seporate develoopment envionment in order to build something. I was talking to some of the large HPC customers that they were very exited about this card.
 

jn77

Distinguished
Feb 14, 2007
587
0
18,990
[citation][nom]darthvidor[/nom]I wonder when will "Co" get separated from the Processor and get into motherboard sockets other than pci-express cards.[/citation]

Back in the day, they were separate, now they are combined, but if they become as big as the CPU or bigger, then 1. It’s time for socket LGA 3011 or something else
 

Camikazi

Distinguished
Jul 20, 2008
1,405
2
19,315
[citation][nom]iceman1992[/nom]If it's a co-processor, what's the other processor? An i7?[/citation]
A Xeon, this is server stuff, they don't use Cores, they use Xeons.
 

lamorpa

Distinguished
Apr 30, 2008
1,195
0
19,280
[citation][nom]oparadoxical_[/nom]Makes me wonder just what we will have in ten years from now... Especially for personal computers.[/citation]
...or 12 or 8 years, or what a cell phone will be like in 9.3 years, or the weather in 3 weeks, or my nagging knee pain in 16 months...
 

Vladislaus

Distinguished
Jul 29, 2010
1,290
0
19,280
[citation][nom]otacon72[/nom]Let's see AMD compare their 16 core chip to that one. I mean they compared their 16 core CPU to the Xeon with 6 cores. Seems fair.[/citation]
This is a coprocessor, not a CPU. Please don't mix both.
 

g00ey

Distinguished
Aug 15, 2009
470
0
18,790
[citation][nom]nottheking[/nom]... it just kind of emerged as an alternative to "extra CPU" around 2004 when the Pentium D and Athlon64 X2 came out... And there was bickering there, too. Yet no one deemed that a chip failed to be "dual-core," if it merged or removed redundant parts, such as the memory interface or I/O, or shared a pool of cache. Looking at that philosophy, Bulldozer's modules are the next logical step in the evolution of sets of 2 cores.[/citation]
I seriously don't understand why you bring up the Pentium D and the old X2, I cannot recall any "bickering" about their core count and while the Pentium D had 4 ALUs (2 per core), everyone still agreed that it was a dual core CPU.
 
G

Guest

Guest
+1 I resent any processor that cannot benchmark Crysis ! Heck maybe crytek will be bribed to recode crysis for this such setup and windows 8 in the near future for hundreds of cores..
 
[citation][nom]Stardude82[/nom]What about GPU processing? Isn't that what CUDA is for? After all, my 460 GTX has 336 cores.[/citation]
its basically an alternative to using cuda or opencl with a dedicated gpu. it doesnt need a proprietry language like cuda though.
 

dragonsqrrl

Distinguished
Nov 19, 2009
1,280
0
19,290
[citation][nom]kronos_cornelius[/nom]I got my money on Cuda clusters. All the applications they mention as being highly parallel benefit from CUDA, so why get 50 cores when you can get 512[/citation]
Because the individual cores are in no way directly comparable, in terms of either architecture or performance.
 

Th-z

Distinguished
May 13, 2008
74
0
18,630
[citation][nom]fazers_on_stun[/nom]NV's top GPU does 515 peak GFLOPs: http://www.nvidia.com/object/perso [...] uting.htmlIIRC, sustained throughput is about 2/3rds of peak. Since the Knight's Corner does over 1 TFLOP sustained (according to the slide anyway), that would be 3X NV's best GPU on double precision..[/citation]

NVIDIA has card based on 512 CUDA cores (M2090), which does 665 GFLOPS in DP. It's listed under "Tesla Data Center" category. All these numbers are posted by NVIDIA, we don't know if they are theoretical peak or sustained. If Intel's number is true, 1 TFLOPS in DP sustained is surely very impressive. People have to realize that unlike cores found in GPUs today, these Intel cores are compatible to x86 ISA, so it programs just like normal x86 CPU, and it can operate on its own. With GPU, you need to have CPU accompanies it. But again without further information, we don't know the price and power consumption of this chip compare to GPU solutions.
 

Th-z

Distinguished
May 13, 2008
74
0
18,630


Exactly, I thought people should know it by now. Just like you don't compare performance solely on number of AMD's Stream Processor to NVIDIA's CUDA core. In terms of individual size and complexity, CUDA core is bigger and more complex than AMD's SP, and these Intel cores are surely bigger and more complex than CUDA cores.
 

wiyosaya

Distinguished
Apr 12, 2006
915
1
18,990
[citation][nom]otacon72[/nom]Let's see AMD compare their 16 core chip to that one. I mean they compared their 16 core CPU to the Xeon with 6 cores. Seems fair.[/citation]
It is specifically stated in the article that Knights Corner is not a CPU, rather, it is a mathematical co-processor. Once these hit the market, we will see, for sure, its place in computer architecture. A more correct AMD comparison, in my opinion, would be against workstation GPUs or consumer GPUs used for high-performance mathematical computing.

The architecture of this thing is highly parallel. Even if its primary use were as a CPU, you would still have all the problems associated with programs that are not highly threaded. It only derives its intense computational power from the fact that it can do many operations in parallel - just like GPUs.

I suggest maintaining an awareness of the almost verbatim marketing blather repeated in this, IMHO, somewhat dumbed down version of the release. In case anyone is interested, this is a somewhat more informative article which clearly states that the chip is not intended as a CPU.
 

wiyosaya

Distinguished
Apr 12, 2006
915
1
18,990

Exactly. And each has their own niche where they out perform the other. As a BOINC participant running MilkyWay@Home on an Nvida GPU, I have noticed that for MW@H, in general, AMD GPUs are about 8-times faster than Nvidia. In other BOINC projects, the performance is reversed between the chips from the different manufacturers. It is not a factor of newer generation cards from a specific manufacturer, either. The comparison I made is between an 8800 GT, and AMD in general; however, I also have a GTX 460 which is only about 25-percent faster for MW@H than the 8800 GT.

 
Status
Not open for further replies.