Intel's Knights Corner: 50+ Core 22nm Co-processor

nebun · Nov 16, 2011

[citation][nom]sinfulpotato[/nom]This isn't a CPU, it is more like the function of a GPU used in parallel processing.[/citation]
this is very true, but GPU processing is much more advanced, i am reffering to CUDA processing not STREAM processing

southernshark · Nov 16, 2011

I remember when they added the math coprocessor to the 486...... Seems similar. 😀 😀 😀

PrvtChurch · Nov 16, 2011

22nm die. hmmmmm they seem to get smaller and smaller. just like children.

nieur · Nov 16, 2011

wondering which instruction set it will use?

ta152h · Nov 16, 2011

[citation][nom]acadia11[/nom]But Bulldozer hit 8.59 GHz with liquad helium cooling.[/citation]

Who cares? It's not like that's useful. More importantly, Bulldozer hit 8.83 GHz on Pluto. AMD is pretty sure they can beat that on the dark side of Mercury, so don't be surprised if you see a new record of 8.926 set soon, although that's just a rough estimate.

saturnus · Nov 16, 2011

This is what is left of Intel's bid to compete in the GPU market, Larabee. A co-processor with very lack luster compute performance compared to todays GPU cards but really it can neither do GPU work not CPU work, so one has to wonder why Intel even bothers. Seems like a desperate, an failed, last attempt at trying to keep up with ARM.

nottheking · Nov 16, 2011

Yep, this looks like the current incarnation of Larrabee. Bets are open on whether this will actually make it to market; Intel's trying to push for outright dominance in a field they've been outside of the whole time, while nVidia and AMD have been working with years and years of experience under their respective belts.

Since this is an external co-processor run through PCI-express, this makes it not different at all from nVidia or AMD's GPGPU solutions... And that means Intel's badly beat with only 50 cores. Even assuming that the 1 TFLOP number is actually double-precision, that makes this still-not-officially-benchmarked-or-released chip only perhaps in the same LEAGUE as existing, in-market stuff out now. How does Intel expect this to compete when AMD *already* has a 676 GigaFLOP (over 2/3 the power) card available, RIGHT NOW, for under $400US? By the time Knight's Corner could release, AMD will have their Tahiti 7970 out, which likely will rock in at 2+ TeraFLOPs for a single GPU, at the same sub-$400US price point.

[citation][nom]soccerdocks[/nom]Do not fall for AMD's marketing. They do not a have 16 core chip. They have an 8 "module" chip that has 16 integer processors, but does NOT have 16 full cores.[/citation]
If it's 8 cores, then each core has FPU power that badly embarasses Sandy Bridge... A single core's FPU capabilities of Sandy Bridge only allow for 128 bits of FPU data (either 1x128-bit x87, 4x32-bit SSE, or 2x64-bit AVX) per clock, while each Bulldozer module allows for double that: 2x128-bit under x87, and either 8x32-bit or 4x64-bit using AVX.

palladin9479 · Nov 16, 2011

Intel's version of a GPU without the frame buffer, translation units, and other visual components. Just the raw VPU's, local memory and some sort of I/O controller. Might be interesting as a "drop in" style card.

For instruction set, it'd have to be SSE / AVX style SIMD instructions.

iceman1992 · Nov 16, 2011

If it's a co-processor, what's the other processor? An i7?

Pherule · Nov 16, 2011

[citation][nom]oparadoxical_[/nom]Makes me wonder just what we will have in ten years from now... Especially for personal computers.[/citation]
I don't know about 10 years time, but I want 20+ cores in a desktop processor in 2013 or sooner. Ivy Bridge needs to support up to at least 8 cores (16 threads on i7 models) and before you say lol moar cores - I would utilize every single core rather heavily.

ohim · Nov 16, 2011

[citation][nom]gmcizzle[/nom]Lol wow 50 cores. Guess that makes AMD's 16-core reveal another flop.[/citation]
Sorry to say it but is this site full of retards ? Unless intel sells that 50 core cpu at the same price as AMD`s 16 core (yeah right in a paralel universe) you just look like a retard that bashes AMD for absolutely no reason at all.

JonnyDough · Nov 16, 2011

This is a parallel CPU core, its more akin to a graphics processor, which are already nearing the TFLOP mark. AMD already has that in their HD7000 GPU. Don't get excited kids, this is a specialized CPU made for special tasks. It is NOT for your desktop or mobile PCs.

nottheking · Nov 16, 2011

[citation][nom]Pherule[/nom]I don't know about 10 years time, but I want 20+ cores in a desktop processor in 2013 or sooner. Ivy Bridge needs to support up to at least 8 cores (16 threads on i7 models) and before you say lol moar cores - I would utilize every single core rather heavily.[/citation]
Not gonna happen. Following Intel's "Tick tock" strategy, 2012 sees us our next die shrink, (to 22nm from 32nm) then 2013 sees the next generation architecture, Haswell, get produced. Intel only does "tick" die shrinks on even-numbered years.

Intel's 32nm fits up to 6 cores. (as seen on both Sandy Bridge-E and the prior-gen Gulftown) A die shrink will double the effective useable die area... So at MOST you'd get 12 cores per die. However, Intel's strategy has focused a LOT more on the "uncore" part of the chip, including integrated graphics. (that appear even in the i7s) Chances are good more development of these will occur there, and along with intentions to yield still-better per-clock performance than Sandy Bridge, core size will go up as well. 10 cores is a POSSIBILITY for Ivy Bridge E; I think 8 is more likely. So they could do 16-core stuff, but dual-die CPUs are so large and unwieldy they only go in server sockets, so we're talking a Xeon, NOT an i7.

The reason AMD's CPUs have more cores is more due to design emphasis; Intel's focusing more on the uncore, as well as individually beefier cores, than core count. So I expect AMD to retain the most cores-per-die in this regard, provided they keep up in terms of manufacturing processes.

stinkyfax · Nov 16, 2011

Isn't it an analogue to gpu?

halcyon · Nov 16, 2011

Intel isn't being fair. They're just beating up on AMD to make a point.

g00ey · Nov 16, 2011

I'm not impressed at all. Doesn't GPU's perform over 2 Teraflops by now? Also the "ASCII Red" from 1997 was NOT the first system capable of over 1 TFLOP. The first known system was the Cray T3E that cam by the end of 1995 and it was advertized to deliver "over" 1.6 TFLOPS.

de5_Roy · Nov 16, 2011

a gpu version of this 50 core thingie might make a capable igp for haswell. just wondering.

g00ey · Nov 16, 2011

[citation][nom]nottheking[/nom] If it's 8 cores, then each core has FPU power that badly embarasses Sandy Bridge... A single core's FPU capabilities of Sandy Bridge only allow for 128 bits of FPU data (either 1x128-bit x87, 4x32-bit SSE, or 2x64-bit AVX) per clock, while each Bulldozer module allows for double that: 2x128-bit under x87, and either 8x32-bit or 4x64-bit using AVX.[/citation]
Well then they have managed to bring a better CPU to the market which is what R&D is all about after all. But, alas, it's still a mere 8 cores (with an enhanced variant of hyperthreading/SMT) no matter how you paint it, and not the 16 as they are falsely advertising.

But comparing AMD's Bulldozer chip with the Knights Corner is like comparing apples and oranges. It would be a lot more fair to compare it to AMD's Southern or Northern Islands chips or nVidias GPU chips.

nottheking · Nov 16, 2011

[citation][nom]g00ey[/nom]I'm not impressed at all. Doesn't GPU's perform over 2 Teraflops by now? Also the "ASCII Red" from 1997 was NOT the first system capable of over 1 TFLOP. The first known system was the Cray T3E that cam by the end of 1995 and it was advertized to deliver "over" 1.6 TFLOPS.[/citation]
Actually, as I mentioned, the highest-performing single-GPU card only does about 675 GigaFLOPS. Keep in mind that the numbers are different depending on the level of precision: supercomputers are measured using DOUBLE-precision floating-point, aka 64-bit FP. This level of precision is what's needed for scientific and engineering tasks. Meanwhile, standard 3D rendering, gaming, and media tasks are fine using 32-bit single-precision FP. Hence, a lot of consumer-targetted equipment is measured using single-precision; the teraflop figures from AMD (as well as the entirely made-up teraflop figures for the consoles) are referring to single-precision power.

Depending on the architecture, single-precision FP units can be used to produce double-precision results, the double-precision levels will be as much as half the single-precision, (some types of units, such as x86 FPUs that use AVX, namely Sandy Bridge and Bulldozer, as well as the PowerXCell 8i) to quarter (Radeon 6000-series GPUs) a fifth (Radeon 5000-series GPUs, newer nVidia GPUs) to as low as a tenth. (older nVidia GPUs, the PS2's Cell)

And no, ASCI Red was the first computer to actually pass 1 teraFLOP. Just because Cray advertised the performance doesn't mean everyone was getting it; supercomputer performance on the TOP 500 list isn't done off of theoretical peak numbers, but actual, real-world benchmark results. This allows for measurements of just how PRACTICAL those math units are, and what they ACTUALLY can achieve. (as a note, currently Intel CPUs tend to get a slightly closer to their theoretical numbers than AMD CPUs do, and GPGPUs don't get anywhere near close) The first Cray T3E that passed 1 TFLOP wasn't built until 1998.

[citation][nom]g00ey[/nom]Well then they have managed to bring a better CPU to the market which is what R&D is all about after all. But, alas, it's still a mere 8 cores (with an enhanced variant of hyperthreading/SMT) no matter how you paint it, and not the 16 as they are falsely advertising.[/citation]
It's hardly so clear-cut. Yes, it does blur the line on what level of parallelism is being done here, but they're very much cores given that they have complete hardware capability to run two threads per module, and not merely "virtualize" two threads akin to Hyperthreading. Each module has TWO sets of L1 data cache, and is capable of running two floating-point threads in hardware. (with the FPU being the contentious point here) The only exception, technically, is AVX, but Sandy Bridge can't truly support a full thread of AVX in a single core either.

And of course, keep in mind that the concept of a "core" isn't quite fully defined either; it just kind of emerged as an alternative to "extra CPU" around 2004 when the Pentium D and Athlon64 X2 came out... And there was bickering there, too. Yet no one deemed that a chip failed to be "dual-core," if it merged or removed redundant parts, such as the memory interface or I/O, or shared a pool of cache. Looking at that philosophy, Bulldozer's modules are the next logical step in the evolution of sets of 2 cores.

Guest · Nov 16, 2011

Tegra, cell, arm, opencl ,cuda...done right the Intel way. If I understand this well programs can use this parallel computing power much more easy, it's seen as a network with render nodes??

g00ey · Nov 16, 2011

[citation][nom]nottheking[/nom]... It's hardly so clear-cut. ...[/citation]
Yes, it is, at least to me. Just because you put two ALU's into each core it doesn't double the core count just because of that. Yes, the design of each core (which is now called "module" by AMD) is improved but, still the Bulldozer CPU which they claim is 16 core is in fact only 8 core.

I think they are really making a fool out of themselves by calling it a 16 core. It's like I would work at Harvard for a couple of years as a research assistant and then walk around saying that I've got a PhD at Stanford and worked there as a Professor.

I respect you for your post but when it comes to the core count, you don't win me over unfortunately.

In Short: AMD's marketing = EPIC FAIL! *lol*

aldaia · Nov 16, 2011

[citation][nom]nottheking[/nom]And no, ASCI Red was the first computer to actually pass 1 teraFLOP. Just because Cray advertised the performance doesn't mean everyone was getting it; supercomputer performance on the TOP 500 list isn't done off of theoretical peak numbers, but actual, real-world benchmark results. This allows for measurements of just how PRACTICAL those math units are, and what they ACTUALLY can achieve. (as a note, currently Intel CPUs tend to get a slightly closer to their theoretical numbers than AMD CPUs do, and GPGPUs don't get anywhere near close) The first Cray T3E that passed 1 TFLOP wasn't built until 1998.[/citation]
Exactly, the fujitsu K supercomputer has made to the top in the November 2011 top 500 list (see: http://www.tomshardware.com/news/supercomputer-top500-power-k-computer-mainframe,13979.html) because the system was actually build and tested. The K supercomputer is the first to hit the 10 Petaflop barrier. Fujitsu already announced the PRIMEHPC FX10 Supercomputer that is scalable to 23.2 Petaflops, however nobody has build or even ordered one of those yet.

fazers_on_stun · Nov 16, 2011

nottheking :

Not quite true. According to http://www.realworldtech.com/page.cfm?ArticleID=RWT091810191937&p=6:

The execution units in Sandy Bridge were reworked to double the FP performance for vectorizable workloads by efficiently executing 256-bit AVX instructions. Almost all 256-bit AVX instructions are decoded into and execute as a single uop – in contrast to AMD’s more cautious embrace of AVX, which will crack 256-bit instructions into two 128-bit operations on Bulldozer.

Sandy Bridge can sustain a full 16 single precision FLOP/cycle or 8 double precision FLOP/cycle – double the capabilities of Nehalem. This guarantees that software which uses AVX will actually see a substantial performance advantage on Sandy Bridge and should spur faster adoption.

Sandy Bridge can execute a 256-bit FP multiply, a 256-bit FP add and a 256-bit shuffle every cycle. However, the floating point data paths were not expanded and are still 128-bits wide; instead the SIMD integer data paths are enlisted to assist with AVX operations.

This is why for desktop apps, the 4-module Bulldozer 8150 gets beaten by the 4-core 2600K: http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-14.html

Integer and floating-point math are both improved in the Bulldozer architecture, allowing the FX-8150 to place second behind Intel’s Core i7-2600K.

Exceptional integer SSE2 performance catapults FX-8150 ahead of Intel’s lineup in Sandra’s Multimedia metric. Shared floating-point units aren’t able to achieve the same results, though FX-8150 nearly matches Intel’s Core i7-2600K.

nottheking · Nov 16, 2011

[citation][nom]g00ey[/nom]In Short: AMD's marketing = EPIC FAIL! *lol*[/citation]
I would've dignified you with more of a response (which would've just really been a rehash of what I've already told you twice) but this line here sorta indicated that it'd just go over your head, too.

[citation][nom]aldaia[/nom]Exactly, the fujitsu K supercomputer has made to the top in the November 2011 top 500 list because the system was actually build and tested... The PRIMEHPC FX10 Supercomputer that is scalable to 23.2 Petaflops, however nobody has build or even ordered one of those yet.[/citation]
Yep, you pretty much managed to sum it all up right there.

[citation][nom]fazers_on_stun[/nom]Not quite true. According to http://www.realworldtech.com/page. [...] 91937&p=6:This is why for desktop apps, the 4-module Bulldozer 8150 gets beaten by the 4-core 2600K: http://www.tomshardware.com/review [...] 43-14.html[/citation]
I could be mistaken then; I had recalled that Sandy Bridge's FPUs allowed it to have 256-bit wide AVX registers, but that actual execution/retirement of the instructions was basically "pipelined," though this shows that instead it "steals" capability from the integer units to achieve the full width. Of course, this is probably a better solution, as it winds up taking resources that are likely less going to be used. (as opposed to what would basically be its twin's FP SSE unit)

g00ey · Nov 16, 2011

[citation][nom]nottheking[/nom]I would've dignified you with more of a response ...[/citation]
I take it that you in some way or other are working for AMD or even representing AMD. Stealth marketing where people boast their products is kind of common in forums such as these.

It's nothing personal I'm just being honest with my opinions. We have a market today where marketing buzzwords such as Ultra, Mega, Power, Super and all possible combinations thereof are received in strong disbelief. To pursue a dishonest marketing strategy where you state that a product has a certain feature that it actually has not will only backfire in the end.

Try selling say 1 kg of washing powder of a known brand as 2 kg in the supermarket and charge at the 2 kg price point. It doesn't even matter that the powder is twice as good as "normal" washing powder, people will get upset and the manufacturer will try to sue you for damaging their brand.

So I think that companies such as AMD and Intel will gain a lot more in the long run by sticking to fair and honest marketing. But it is the quality and performance the CPU delivers that matters in the end, so if AMD manages to impress on the benchmark tests and the consumers while selling at a competitive price point AMD may get away with their misleading advertising anyway.

Intel's Knights Corner: 50+ Core 22nm Co-processor

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Splendid

Distinguished

Distinguished

Guest

Guest

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Share this page