IBM Sequoia to Blaze Past All Other Supercomputers

Status
Not open for further replies.

leo2kp

Distinguished
Oct 9, 2006
2,055
1
20,160
115
I don't think Crysis will work on that thing. It only uses 2 or 4 cores. Wouldn't run any better, if not worse, than a high-end gaming rig IMO.
 

curnel_D

Distinguished
Jun 5, 2007
741
0
18,990
1
[citation][nom]Shadow703793[/nom]1. Why not use Cell and GPUs fore more acceleration.2. Why PowerPC CPUs? Cause of the already written code for PPC?[/citation]
For the poeple writing the code, they're probably more experienced with writing unix based code for ppc instead of x86. Just a guess.
 

daskrabbe

Distinguished
Feb 3, 2008
213
0
18,680
0
[citation][nom]leo2kp[/nom]I don't think Crysis will work on that thing. It only uses 2 or 4 cores. Wouldn't run any better, if not worse, than a high-end gaming rig IMO.[/citation]

I'm pretty sure the graphics use more than 4 pipelines.
 

FlayerSlayer

Distinguished
Jan 21, 2009
181
0
18,680
0
"With the Sequoia topping out at around 20Tflops, the bar has been raised tremendously in the span of only seven months."

Don't you mean 20 Pflops? And 7 months from the CREATION of a 1 PFLOPS machine to the DESIGN of the 20 PFLOPS machine, it will be 4 years between their creation.
 

Tindytim

Distinguished
Sep 16, 2008
1,179
0
19,280
0
[citation][nom]leo2kp[/nom]I don't think Crysis will work on that thing. It only uses 2 or 4 cores. Wouldn't run any better, if not worse, than a high-end gaming rig IMO.[/citation]
It wouldn't work, but it has nothing to do with cores. It's PowerPC, not x86. Unless they got there hands on the source, or Crytek decided to releases it specifically for that system, it's not going to work.

[citation][nom]Shadow703793[/nom]1. Why not use Cell and GPUs fore more acceleration.2. Why PowerPC CPUs? Cause of the already written code for PPC?[/citation]
You do know that the Cell is a PowerPC based processor? And they don't need GPUs, this isn't going to do graphical processing. It's for number crunching, not HD video, or gaming.
 

AbhiNambiar

Distinguished
Feb 5, 2009
1
0
18,510
0
Maybe the Buddhists in Tibet would like to borrow it for a couple days, perhaps? (reference - Arthur C. Clarke, The Nine Billion Names of God)
 

Mr_Man

Distinguished
Feb 17, 2008
202
0
18,680
0
I'm cool with this just as long as they stick with the name Sequoia. If they ponder naming it Skynet... I'll be moving to Mars.
 

resonance451

Distinguished
Feb 13, 2008
426
0
18,780
0
No, I think Crysis will slow it down. That miserable waste of resources can take a 20 lumaflop super super computer and put it to waste.
 

nottheking

Distinguished
Jan 5, 2006
1,456
0
19,310
16
[citation][nom]Shadow703793[/nom]1. Why not use Cell... ...fore more acceleration.[/citation]
Contrary to what most are under the belief of, the Cell actually sucks for general-purpose computing operations. Each of the SPEs is utterly "dumb" in that it cannot handle complex operations, and really only works well for streaming SIMD operations.

Additionally, the performance numbers claimed by Sony are utterly bogus, just like their claims that the PS2 would've been restricted and impossible to export from Japan for being "too technologically advanced." Basically, it's NOT a 1-2 TFlop chip, and in the PS3, can only handle up to 185.6 GigaFLOPs as its theoretical peak... If they are 32-bit operations! FP32 is not very useful for heavy-math applications, including gaming, physics, and anything a supercomputer would be handling, which is usually dealing with double-precision FP64, which is used by the well-known LINPACK benchmark, (the standard for testing supercomputers) as well as popular distributed computing applications such as all those @home programs. On such a test, the Cell's design would cripple it to a peak theoretical performance of around 51.2 GigaFLOPs; a number that's comparable to, say, a readily affordable PC gaming CPU like the Core2Duo E8400, with the C2D having the advantage of being vastly more programmable and flexible, meaning it'd most likely score much higher on LINPACK. In other words, the peak for the Cell assumes that the 1/8th of the 64-bit power that lies in the PPE isn't being sapped by giving commands to the PPEs, and that the 7/8ths that are the SPEs aren't being held up waiting for commands from the PPE, which is a situation that'd be very hard to get in the first place, and an impossible one to sustain outside of streaming SIMD for longer than a couple clock cycles, let alone the 3.2 billion cycles for a full second. And once you consider the extreme level of power consumption of the Cell... It doesn't make sense for supercomputing applications; it winds up LESS efficient than other designs.

Basically, the Cell was designed to be really good at one thing: handling streaming media. Encoding/decoding is an application that's very intense on low-precision math, and not very intense at all when it comes to instructions. Hence, the Cell is a perfect fit; it is the only chip out there (CPU, GPU, whatever) that's capable of handling SEVERAL fully high-definition video streams, complete with HDCP and even in exotic formats like DivX/XviD, all simultaneously, an application that is largely impossible for most PC CPUs without the aid of a graphics card alongside it.

[citation][nom]Shadow703793[/nom]Why not use... ...GPUs fore more acceleration.[/citation]
It's not the same problem with Cell processors that results in no supercomputers built out of GPUs, but nonetheless, there are technical reasons behind it that keep them from being used as well. There are actually three that I can think of.

The first is that of memory latency; GPUs operate with a very high degree of latency on the memory; since they're handling relatively linear tasks, and when dealing with textures and shaders, always call up very large, sequential blocks of memory at a time, having a CAS latency of 30+, 40+, or more clock cycles doesn't really matter, since the GPU will know much farther in advance what it'll be needing next around 99% of the time. The same benefit can be applied to decoding media; being a streaming application, latency doesn't hurt it. However, when it comes to scientific applications, that really can be harmful, as in those cases the predominant bottleneck invariably winds up being data and instruction latency, something that's also hurt heavily by how GPUs have an extremely skewed processing unit-to-cache ratio, a ratio that's vastly different than what's found in general-purpose CPUs.

The second reason that occurred to me is the lack of a standard multi-GPU architecture that would be able to support a large quantity of GPUs even just for mathematic operations; the current limit for ANY design appears to be 4 GPUs, from either nVidia or ATi/AMD. So, while yes, while in theory you could produce the same floating-point capacity using only 1/7.5th the number of RV770s compared to what Sequoia uses (i.e, 13.3% the number) as of yet, there is no way to actually build that assembly, so in practice, it's a moot point.

The final reason is actually that of power and heat; GPUs may have a very high degree of performance-per-watt efficiency when it comes to math, but they STILL have a very high TDP per chip. The cost of the actual chips are usually one of the minor parts of a supercomputer, as a lot more care has to be given to providing enough power to stably run thousands upon thousands of nodes, with not just multiple CPUs per node, but all the other components as well, all of which must be powered and cooled. With GPUs, you're going to have your heat production focused on a far smaller number of chips, so you'll need to actually have more intensive cooling, and likely greater spacing between GPUs, since you can't just blow hot air out the back of the case, since there will be more nodes in every direction. There's a good chance that one would actually have to construct a LARGER facility to house an equally-powerful supercomputer built from GPUs than one built from multi-core general-purpose CPUs.
 
G

Guest

Guest
I think they won't need GPGPU.
There are about 4000 cores in that baby; probably enough to fully simulate each thread per cpu, and play it at 100 viewpoints graphics to the max, and limitless viewingdistance @ 500fps or higher!
It can be rerouted to select 1 processor by itself,or a cluster of processors, only calculating the stencil buffer,while another will be calculating light paths and shadows.
Unlike a GPGPU which needs to run everything through 1 core,and benefits from a smaller thread than a cpu.
In this case it probably wouldn't matter, even if it only had 5 CPU's with each being a Quadcore and without graphics card.

As far as naming, they probably rather want to call it Mammoth than Baby.
 
Status
Not open for further replies.

ASK THE COMMUNITY

TRENDING THREADS