Xeon Phi: Intel's Larrabee-Derived Card In TACC's Supercomputer

pjkenned · Nov 13, 2012

After eight years of development, Intel is finally ready to announce its Xeon Phi Coprocessor, which is derived from the company's work with Larrabee. Although the architecture came up short as a 3D graphics card, it shows more promise in the HPC space.

Xeon Phi: Intel's Larrabee-Derived Card In TACC's Supercomputer : Read more

esrever · Nov 13, 2012

tacoslave · Nov 13, 2012

i wonder if they can mod this to run games...

mocchan · Nov 13, 2012

Articles like these is what makes me more and more interested in servers and super computers...Time to read up and learn more!

wannabepro · Nov 13, 2012

Highly interesting.
Great article.

I do hope they get these into the hands of students like myself though.

ddpruitt · Nov 13, 2012

Intriguing idea....

These X86 cores have the uumph to run something a little more complex than what a GPGPU can. But is it worth it and what kind of effort does it require. I'd have to disagree with Intel's assertion that you can get used to it by programming for an "i3". Anyone with a relatively modern graphics card can learn to program OpenCL or CUDA on there own system. But learning how to program 60 cores efficiently (or more) from an 8 core (optimistically) doesn't seem reasonable. And how much is one of these cards going to run? You might get more by stringing a few GPUs together for the same cost.

I'm wonder if this is going to turn into the same time of niche product that Intel's old math-coprocessors did.

CaedenV · Nov 13, 2012

man, I love these articles! Just the sheer amounts of stuffs that go into them... measuring ram in hundreds of TBs... HDD space in PBs... it is hard to wrap one's brain around!

I wonder what AMD is going to do... on the CPU side they have the cheaper (much cheaper) compute power for servers, but it is not slowing Intel sales down any. Then on the compute side Intel is making a big name for themselves with their new (but pricy) cards, and nVidia already has a handle on the 'budget' compute cards, while AMD does not have a product out yet to compete with PHI or Tesla.
On the processor side AMD really needs to look out for nVidia and their ARM chip prowess, which if focused on could very well eat into AMD's server chip market for the 'affordable' end of this professional market... It just seems like all the cards are stacked against AMD... rough times.

And then there is IBM. The company that has so much data center IP that they could stay comfortably afloat without having to make a single product. But the fact is that they have their own compelling products for this market, and when they get a client that needs intel or nvidia parts, they do not hesitate to build it for them. In some ways it amazes me that they are still around because you never hear about them... but they really are still the 'big boy' of the server world.

A Bad Day · Nov 13, 2012

[citation][nom]esrever[/nom]meh[/citation]

*Looks at the current selection of desktops, laptops and tablets, including custom built PCs*

*Looks at the major data server or massively complex physics tasks that need to be accomplished*

*Runs such tasks on baby computers, including ones with an i7 clocked to 6 GHz and quad SLI/CF, then watches them crash or lock up*

ENTIRE SELECTION IS BABIES!

[citation][nom]tacoslave[/nom]i wonder if they can mod this to run games...[/citation]

A four-core game that mainly relies on one or two cores, running on a thousand-core server. What are you thinking?

ThatsMyNameDude · Nov 13, 2012

Holy shit. Someone tell me if this will work. Maybe, if we pair this thing up with enough xeons and enough quadros and teslas, we can connect it with a gaming system and we could use the xeons to render low load games like cod mw3 and tf2 and feed it to the gaming system.

mayankleoboy1 · Nov 13, 2012

Main advantage of LRB over Tesla and AMD firepro S10000 :

A simple recompile is all thats needed to use PHI. Tesla/AMD needs a complete code re write. Which is very very expensive .
I see LRB being highly successful.

PudgyChicken · Nov 13, 2012

It'd be pretty neat to use a supercomputer like this to play a game like Metro 2033 at 4K, fully ray-traced.

I'm having nerdgasms just thinking about it.

palladin9479 · Nov 13, 2012

Each of these cards is their own computer and runs their own OS. The host system will need to manage the environment of the card and give it some form of permanent storage, most likely through abstraction.

I've worked with specialized cards like these, through they were for running non-native x86 code on a SPARC platform. They act like a totally separate system with their own IP / virtual frame buffer / IO space, the whole works. You treat them just like you would a dedicated server. Which makes their choice of linux pretty clear, you can easily cluster multiple non-uniform linux servers for distributed processing.

These are very interesting, won't be quite as powerful as a dedicated vector processor but handles general computing tasks well enough and can run code natively without needing to rewrite the program.

PreferLinux · Nov 13, 2012

[citation][nom]CaedenV[/nom]man, I love these articles! Just the sheer amounts of stuffs that go into them... measuring ram in hundreds of TBs... HDD space in PBs... it is hard to wrap one's brain around!I wonder what AMD is going to do... on the CPU side they have the cheaper (much cheaper) compute power for servers, but it is not slowing Intel sales down any. Then on the compute side Intel is making a big name for themselves with their new (but pricy) cards, and nVidia already has a handle on the 'budget' compute cards, while AMD does not have a product out yet to compete with PHI or Tesla.On the processor side AMD really needs to look out for nVidia and their ARM chip prowess, which if focused on could very well eat into AMD's server chip market for the 'affordable' end of this professional market... It just seems like all the cards are stacked against AMD... rough times.And then there is IBM. The company that has so much data center IP that they could stay comfortably afloat without having to make a single product. But the fact is that they have their own compelling products for this market, and when they get a client that needs intel or nvidia parts, they do not hesitate to build it for them. In some ways it amazes me that they are still around because you never hear about them... but they really are still the 'big boy' of the server world.[/citation]
1. Not really, their price/performance ratio is fairly similar. They're a lot cheaper, but also perform a lot worse. Add in the higher power consumption (and it surely matters in anything at this level), and if anything Intel is better.

2. AMD released the FirePro S10000 less than 24 hours ago. It is competing directly against this and Tesla.

-----------------------------------

On a completely separate note, I'm wondering what the price of these is. nVidia's latest K20 costs about $3200 and is rated at 1.17 TFLOPS peak and 225 W. This is rated at 1.01 TFLOPS peak, with the same power rating. It wouldn't be hard for Intel to beat nVidia on price...

Alphi · Nov 13, 2012

aye. they can accurately model the weather... but during downtime they cannot play 3D games at ridiculously high frame rates. what geek would recommend buying such a "work tool"?

blazorthon · Nov 13, 2012

[citation][nom]PreferLinux[/nom]1. Not really, their price/performance ratio is fairly similar. They're a lot cheaper, but also perform a lot worse. Add in the higher power consumption (and it surely matters in anything at this level), and if anything Intel is better.2. AMD released the FirePro S10000 less than 24 hours ago. It is competing directly against this and Tesla.-----------------------------------On a completely separate note, I'm wondering what the price of these is. nVidia's latest K20 costs about $3200 and is rated at 1.17 TFLOPS peak and 225 W. This is rated at 1.01 TFLOPS peak, with the same power rating. It wouldn't be hard for Intel to beat nVidia on price...[/citation]

Some of AMD's Opterons have far greater performance for the money than any Xeons do, so I'd have to disagree (at least partially) with you. The same is probably not true for all of AMD's Opterons, but it is true for at least some of them.

For example, here:
http://www.newegg.com/Product/Product.aspx?Item=N82E16819113038
$600 for a CPU that in fully threaded work can compete with Intel's six-core and eight-core Xeon models quite well, if not beat them, in work that scales across as many cores as you throw at it and it does so while using not too much power at all.

Other good examples:
http://www.newegg.com/Product/Product.aspx?Item=19-113-036

http://www.newegg.com/Product/Product.aspx?Item=N82E16819113030

Pretty much all of these models:
http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=100008494+600061216&QksAutoSuggestion=&ShowDeactivatedMark=False&Configurator=&IsNodeId=1&Subcategory=727&description=&hisInDesc=&Ntk=&CFG=&SpeTabStoreType=&AdvancedSearch=1&srchInDesc=

They all have better price/performance than the multi-socket Xeons. Lower power efficiency, definitely so in most cases, but not in all.

IndignantSkeptic · Nov 13, 2012

Please forgive my ignorance but why do we need x86 processors still when a compiler now is supposed to automatically generate all the necessary machine codes from the same high level source code?

Also will we ever be able to upgrade supercomputers simply by replacing small parts of it at a time with newer better components and basically having different parts of the supercomputer running at different speeds?

Also why the hell is CUDA and OpenCL advocated repeatedly in this article when OpenMP was supposed to replace them, as far as I understand, and is actually mentioned in one of the pictures?!

memadmax · Nov 13, 2012

Ok, the hardware looks decent.
But, there's one problem: Intel is relying on programmers being able to optimize to a high degree for this setup...

And that is the problem, programmers these days are lazy...
It used to be you would spend a whole day optimizing loops using counters and timers to squeeze every last drop out of that loop...

Now, programmers are like kids with legos, they just merge crap together that was written by someone else and as long as it runs decent, they call it good...

PreferLinux · Nov 13, 2012

[citation][nom]blazorthon[/nom]Some of AMD's Opterons have far greater performance for the money than any Xeons do, so I'd have to disagree (at least partially) with you. The same is probably not true for all of AMD's Opterons, but it is true for at least some of them.For example, here:http://www.newegg.com/Product/Prod [...] 6819113038$600 for a CPU that in fully threaded work can compete with Intel's six-core and eight-core Xeon models quite well, if not beat them, in work that scales across as many cores as you throw at it and it does so while using not too much power at all.Other good examples:http://www.newegg.com/Product/Prod [...] 19-113-036http://www.newegg.com/Product/Prod [...] 6819113030Pretty much all of these models:http://www.newegg.com/Product/Prod [...] rchInDesc=They all have better price/performance than the multi-socket Xeons. Lower power efficiency, definitely so in most cases, but not in all.[/citation]
Some... Good point.

[citation][nom]IndignantSkeptic[/nom]Please forgive my ignorance but why do we need x86 processors still when a compiler now is supposed to automatically generate all the necessary machine codes from the same high level source code?Also will we ever be able to upgrade supercomputers simply by replacing small parts of it at a time with newer better components and basically having different parts of the supercomputer running at different speeds?Also why the hell is CUDA and OpenCL advocated repeatedly in this article when OpenMP was supposed to replace them, as far as I understand, and is actually mentioned in one of the pictures?![/citation]
Because if you've got x86 code and you want to use OpenCL or CUDA you have to completely re-write it. Compilers can't just translate between them. Besides, a lot of HPC stuff would be highly-optimised, and that often means low-level stuff (Assembly, even).

http://en.wikipedia.org/wiki/OpenMP OpenMP is for multi-threading, generally using the CPU. It also works with clusters over a network, hence it is great for this (Xeon Phi). But it doesn't work with GPUs.

[citation][nom]memadmax[/nom]Ok, the hardware looks decent.But, there's one problem: Intel is relying on programmers being able to optimize to a high degree for this setup...And that is the problem, programmers these days are lazy...It used to be you would spend a whole day optimizing loops using counters and timers to squeeze every last drop out of that loop...Now, programmers are like kids with legos, they just merge crap together that was written by someone else and as long as it runs decent, they call it good...[/citation]
Um, I get the distinct impression it needs very little optimisation. I think programmers will find it far easier to optimise for this than for GPGPU which requires a complete re-write.

[citation][nom]PreferLinux[/nom]-On a completely separate note, I'm wondering what the price of these is. nVidia's latest K20 costs about $3200 and is rated at 1.17 TFLOPS peak and 225 W. This is rated at 1.01 TFLOPS peak, with the same power rating. It wouldn't be hard for Intel to beat nVidia on price...[/citation]
As a follow-up to this, (one of) the SemiAccurate articles gave the price, and it was mid-$2000s – much better than nVidia's K20, for a similar performance, and most likely a much better architecture to program for (or make use of).

cats_Paw · Nov 13, 2012

Is there any way to convert that into gaming power? I mean, 60 cpus for gaming is definitly overkill, but would it be posibl to use that and chanel all that raw power into really amazing games?

I ask cus, in reality, gaming industry does seem limited to me ...

IndignantSkeptic · Nov 13, 2012

@PreferLinux, I'm shocked programmers still program in low level. Also how can OpenMP not work on GPUs when a section of OpenMP is OpenACC?!!

blazorthon · Nov 13, 2012

[citation][nom]Cats_Paw[/nom]Is there any way to convert that into gaming power? I mean, 60 cpus for gaming is definitly overkill, but would it be posibl to use that and chanel all that raw power into really amazing games?I ask cus, in reality, gaming industry does seem limited to me ...[/citation]

I doubt that you could do a whole lot with that much performance even if we made games that could somehow utilize that many cores. You might be able to do something like one hell of a physics processing game with very high FPS that way, but otherwise, IDK how much you could actually do with it.

cats_Paw · Nov 13, 2012

[citation][nom]blazorthon[/nom]I doubt that you could do a whole lot with that much performance even if we made games that could somehow utilize that many cores. You might be able to do something like one hell of a physics processing game with very high FPS that way, but otherwise, IDK how much you could actually do with it.[/citation]

Calcultion of many diffrent light sources? Human Skin physx? Foliage movement with diffrent wind patterns?

I think you could do a lot with this 😀.

army_ant7 · Nov 13, 2012

ddpruitt :

I'm not sure if they're any better than GPGPU's in terms of what they can do. One of the big hurdles with GPGPU is making a programs run in a greatly threaded manner, aside from having to learn and code using API's and stuff like OpenCL.

The article seems to mention that you only need to run your normal code (I think just if it's already threaded enough) through Intel's compiler, or do a minimal amount of changes to your current code, for it to run on the Phi's. I think students can practice to do multi-threading with i3's. AFAIK, you can make your code very highly-threaded, and it will still run on hardware that can't run them all concurrently. What just happens is that they're dealt with one after another. I could be wrong with some of that info, but I think the main point still stands.

I think GPGPU-programming involves learning how to program for hundreds and even thousands of cores found on the GPU's. So running less cores may (or may not) be easier, aside from the fact that you don't have to learn how to use other API's and stuff.

It said that these Phi's can run 1TFLOPS each. (I could be wrong with that number. It's best to refer back the article.)

ThatsMyNameDude :

I think that's what OnLive and Gaikai do. Though I don't know if they use Teslas or Quadros. I remember that Nvidia was developing or has developed a product (maybe) specifically for that use.

Nvidia Pushes Kepler for Cloud Gaming With GeForce Grid

IndignantSkeptic :

I think I read from one of the Xeon reviews this year (either the one from March or the comparison with the i7-3970X just recently this November) that the CPU's are hot-swappable, though it would be best to verify that. I'm not sure how performance would be impacted by not having uniform hardware, but I'm guessing that generally, it would just run fine, though I can only speculate.

technoholic · Nov 13, 2012

Excuse me, but aren't GPUs more efficient/optimised for paralleled work loads already? Is GPUs' nature (more coars + higher memory bandwidth of graphics memory + their abily to act alongside a CPU) not more suitable for parallel computing? Also GPUs are relatively cheap and widely available even for a home user. Even a home user can do accelerated work using their GPU. IMO x86 has got its age. Intel doesn't want to adopt new tech

army_ant7 · Nov 13, 2012

It honestly feels somewhat weird/awkward seeing Intel's logo and their solid blue color on what looks like a graphics card. I don't mean anything really negative. Just voicing out my feelings. Hehehe...

Good article! Though I'd like to point out a thing or two. On the last paragraph of the 4th page, "Xeon Phi Hardware," I believe you meant "fins" not "fans." This one got (misled) me as I was actually trying to figure out what the description said about the "fans" and went looking around the pic for maybe almost a minute. :lol: Also, I'm wondering if you meant "properly-ventilated" not "-validated" somewhere on the same page.

Xeon Phi: Intel's Larrabee-Derived Card In TACC's Supercomputer

Distinguished

Splendid

Distinguished

Splendid

Distinguished

Honorable

Splendid

Distinguished

Honorable

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Glorious

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Glorious

Splendid

Distinguished

Distinguished

Distinguished

Share this page