IBM z Processors Climb to 5.5 GHz

hemelskonijn · Aug 28, 2012

Apple switching to intel was a busyness move the 4th generation of PPC or POWER (anything with altivec, velocity or VMX) was the last generation of PPC or POWER cpu's that could be used in mobile computers without needing huge batteries. Keep in mind that the development of intel and ATi's cpu's will be made back relatively fast and they have a relatively low price mainly because of the numbers of shipped. A die shrink for the G5 would mean lower power usage at a higher clock speed and thus make it viable but nor IBM nor Motorola was capable of doing this on a large scale while keeping the price per sku low.

A POWER or PPC of the 5th generation would in most cases be a lot faster then a x86 type cpu of the same generation and on top of that RISC is a cleaner and therefor faster structure but if you only sell a couple million of them it becomes to expensive and the switch in platform simply means more profit for apple.

I used a lot of PPC systems including apples and for example pegasos and i prefer them not because it might be an apple but because i love the PPC structure. Right now there is little to no choice in platform (as in structure) and therefor yes i wish apple never had gone to intel.

mlopinto2k1 · Aug 28, 2012

[citation][nom]Dangi[/nom]Though more complicated to implement one FPGA would destroy CUDA performance.CUDA is very, very good for a very few things, one FPGA is incredibly awesome for one thing and sux for the rest, for everything else you have CPU[/citation]I am willing to accept any information presented to me, albeit, after I verify the information. CUDA is a PROGRAMMING language to control the streaming abilities of the NVIDIA streaming processor cores. It is used to OFFLOAD work that an x86 or ANY OTHER CPU could possibly attempt by itself, due to its massive thread capability. This is the reason it was developed in the first place. To say that the streaming processors on the NVIDIA cards are CPU's is ridiculous. They are back-end co-processors that are able to process information that the main CPU has offloaded to them. You take ANY FPGA (if you even KNOW what that is) and put 3000 of them in your room, you wouldn't need to worry about heating your apartment in antartica. Yes, (a) programmed FPGA chip designed to do a specific task would SMOKE a streaming processor from an NVIDIA chip in the real world, the fact you cannot have 20,000+ of them in one TOWER PC beats them to a bloody pulp. Take 1000 x86 towers with 8 Tesla cards in each one and tell them to do what this IBM stack could do in a similar setup and you will have found that the IBM setup would put you in debt till christ comes back. Lets see (give or take) 1000 x 20,000 = 20000000 cores of offloaded work PLUS lets say a 6 core, 12 thread, dual CPU XEON board (24 threads on each board) for a total of..... 20024000 capable threads. Do the math.

mlopinto2k1 · Aug 28, 2012

[citation][nom]hemelskonijn[/nom]Apple switching to intel was a busyness move the 4th generation of PPC or POWER (anything with altivec, velocity or VMX) was the last generation of PPC or POWER cpu's that could be used in mobile computers without needing huge batteries. Keep in mind that the development of intel and ATi's cpu's will be made back relatively fast and they have a relatively low price mainly because of the numbers of shipped. A die shrink for the G5 would mean lower power usage at a higher clock speed and thus make it viable but nor IBM nor Motorola was capable of doing this on a large scale while keeping the price per sku low. A POWER or PPC of the 5th generation would in most cases be a lot faster then a x86 type cpu of the same generation and on top of that RISC is a cleaner and therefor faster structure but if you only sell a couple million of them it becomes to expensive and the switch in platform simply means more profit for apple.I used a lot of PPC systems including apples and for example pegasos and i prefer them not because it might be an apple but because i love the PPC structure. Right now there is little to no choice in platform (as in structure) and therefor yes i wish apple never had gone to intel.[/citation]Huge batteries??? The chips were inferior. Period.

hemelskonijn · Aug 28, 2012

The chips used at end of life sure ... the structure and platform? you would really be a moron to even suggest that. You do know that intel CPU's now days are almost PPC cpu's with added x86? The only reason intel keeps pushing x86 is because without it we would have better faster and cheaper systems from competitors because they would not longer have a monopoly.

Dangi · Aug 28, 2012

[citation][nom]mlopinto2k1[/nom]I am willing to accept any information presented to me, albeit, after I verify the information. CUDA is a PROGRAMMING language to control the streaming abilities of the NVIDIA streaming processor cores. It is used to OFFLOAD work that an x86 or ANY OTHER CPU could possibly attempt by itself, due to its massive thread capability. This is the reason it was developed in the first place. To say that the streaming processors on the NVIDIA cards are CPU's is ridiculous. They are back-end co-processors that are able to process information that the main CPU has offloaded to them. You take ANY FPGA (if you even KNOW what that is) and put 3000 of them in your room, you wouldn't need to worry about heating your apartment in antartica. Yes, (a) programmed FPGA chip designed to do a specific task would SMOKE a streaming processor from an NVIDIA chip in the real world, the fact you cannot have 20,000+ of them in one TOWER PC beats them to a bloody pulp. Take 1000 x86 towers with 8 Tesla cards in each one and tell them to do what this IBM stack could do in a similar setup and you will have found that the IBM setup would put you in debt till christ comes back. Lets see (give or take) 1000 x 20,000 = 20000000 cores of offloaded work PLUS lets say a 6 core, 12 thread, dual CPU XEON board (24 threads on each board) for a total of..... 20024000 capable threads. Do the math.[/citation]

Dude did you even read my post ??
They guy I quote said that CUDA is better, but as I said CUDA is awesome in some concrete cases and can offload the CPU as you said, yes CUDA is a programming language as also is a "HARDWARE" part that you found in GPU's that's why Nvidia says, for example GTX 670 has 1344 CUDA cores.
I never said CUDA is a CPU, where did you read that ?? ( And yes I know what a FPGA is, and have used it, that's why a said is complicated[/b to implement])

And as I said FPGAs are awesome for ONE thing and sux at everything else

http://www.computerworlduk.com/news/it-business/3290494/jp-morgan-supercomputer-offers-risk-analysis-in-near-real-time/

And yes you can stack FPGAs

Here some examples
http://www.fhpca.org/press_maxwell.html
http://www.timelogic.com/

And yes servers are better, CPUs can do anything, but 'cause of this they can be outperformed by other systems that are highly specialized

For example this
http://infinityexists.com/2009/06/16/fpga-md5-cracker/
FPGA made to crack MD5

Pinhedd · Aug 28, 2012

[citation][nom]Dangi[/nom]Though more complicated to implement one FPGA would destroy CUDA performance.CUDA is very, very good for a very few things, one FPGA is incredibly awesome for one thing and sux for the rest, for everything else you have CPU[/citation]

FPGAs suffer from extremely high inter-element latency that is inherent to their design. This severely limits the capabilities of the chips

raytseng · Aug 28, 2012

for all of you geeking out over hardware, you also need to remember that these mainframes are a platform which means hardware+OS together.

IBM is not just selling you the big iron, they're also selling you zOS so you can run zOS applications.

Sure have fun with your stacks of GPUs or stacks of pcs, but it's not going to give you zOS's mainframe capabilities easily (the software side).

Pinhedd · Aug 28, 2012

[citation][nom]raytseng[/nom]for all of you geeking out over hardware, you also need to remember that these mainframes are a platform which means hardware+OS together.IBM is not just selling you the big iron, they're also selling you zOS so you can run zOS applications.Sure have fun with your stacks of GPUs or stacks of pcs, but it's not going to give you zOS's mainframe capabilities easily (the software side).[/citation]

Not necessarily. It is possible to run SuSE Linux and RHEL, derivatives included.

A Bad Day · Aug 29, 2012

[citation][nom]nevertell[/nom]Not X86, so don't really jump to conclusions about the performance. Nor can you natively run Crysis...[/citation]

Build a x86 emulator software.

Get to play Crysis.

Then cry over the FPS due to the super performance hit...

blazorthon · Aug 29, 2012

[citation][nom]A Bad Day[/nom]Build a x86 emulator software.Get to play Crysis. Then cry over the FPS due to the super performance hit...[/citation]

I think that Power can run x86 with some pretty decent performance if you use a good emulator.

blazorthon · Aug 29, 2012

[citation][nom]mlopinto2k1[/nom]I am willing to accept any information presented to me, albeit, after I verify the information. CUDA is a PROGRAMMING language to control the streaming abilities of the NVIDIA streaming processor cores. It is used to OFFLOAD work that an x86 or ANY OTHER CPU could possibly attempt by itself, due to its massive thread capability. This is the reason it was developed in the first place. To say that the streaming processors on the NVIDIA cards are CPU's is ridiculous. They are back-end co-processors that are able to process information that the main CPU has offloaded to them. You take ANY FPGA (if you even KNOW what that is) and put 3000 of them in your room, you wouldn't need to worry about heating your apartment in antartica. Yes, (a) programmed FPGA chip designed to do a specific task would SMOKE a streaming processor from an NVIDIA chip in the real world, the fact you cannot have 20,000+ of them in one TOWER PC beats them to a bloody pulp. Take 1000 x86 towers with 8 Tesla cards in each one and tell them to do what this IBM stack could do in a similar setup and you will have found that the IBM setup would put you in debt till christ comes back. Lets see (give or take) 1000 x 20,000 = 20000000 cores of offloaded work PLUS lets say a 6 core, 12 thread, dual CPU XEON board (24 threads on each board) for a total of..... 20024000 capable threads. Do the math.[/citation]

Not all software can be made to run in parallel. No parallelism means that performance would suck on the Teslas compared to a good CPU if they tried to run single threaded tasks. For example, any task where each instruction depends on data from the previous instruction can't be run in parallel because everything is reliant on each previous instruction already being executed. Also, you're acting as if threads on a CPU are no different from threads on a GPU. They're not even doing the same type of math. CPUs have floating point capabilities, but most of their work done is still integer math. GPUs don't support integer math, only floating point last I checked.

Heck, compare the IGP of an A8-3850 to the CPU cores for bit-mining. The GPU obviously wins substantially, However, even some tasks that are thrown with the GPU are still able to be better done on a CPU. Compare Win Zip's compression times on any AMD GPU to say a very high-performance CPU. A good CPU can win every time despite GPUs obviously having exponentially greater aggregate performance.

brettms71 · Aug 29, 2012

@raytseng the mainframes also run Linux as well. In fact you can run Linux and zOS on one mainframe under different LPARS. But it's not the hardware that costs the most or anything about performance across different platforms, it's the software that drives a lot of purchase decisions. Try and run an Oracle DB across all of those servers you would need to cluster, and tell Oracle how many cores you would like to License. i can see them rubbing their hands with glee! But tell them you are running Oracle on a z Lpar with 5 Cores under the above z platform and they don't get so excited. (Personal experience there!)

palladin9479 · Aug 29, 2012

I'm going to preface my post for those who don't know me. I'm a systems engineer who's primary field of expertise is Solaris and SPARC (though I've worked with HPUX and AIX before).

You can not directly compare the performance of these kinds of RISC uArchs with something like x86, they serve to do different things. Though of all the RISC big iron systems POWER is the closest to x86 in it's performance profile (very deep and narrow).

The #1 reason you go with big iron isn't cost per CPU cycle but the radically higher I/O capabilities. These platforms will have several different I/O bus's and bridge's that serve to allow insane amounts of data processing to happen. These levels of I/O would crush just about any current x86 implementation on the planet, mostly due to the clunky nature of x86 in regards to parallel instructions. Another thing to mention is that these systems are fully redundant and everything is field replaceable while the system is turned on. This means hot-swappable CPU's, Memory and various I/O cards, everything can be changed without stopping the processing or shutting down the system. Most customers see that capability as more then worth the premium price tag.

IBM's are predominately used for financial databases and scientific calculations. Their primary competitor isn't Intel but Oracle (previously Sun). A decently loaded kit will cost you upwards of $100K USD per box which will handle most tasks thrown at it, if you need some serious computing your taking $500K+ USD for the servers that are build into a couple racks. As I haven't actually built an IBM yet I can't speak from experience on their average socket count or memory load but but I can't see it as too much different from Sun's offerings. So four sockets and 512GB of memory for a commodity level server, eight or more sockets and 1TB+ for something for more demanding muscle.

Finally the software costs on these as much as if not more then the hardware costs, licensing tends to be per-socket and could be a few hundred grand for rack sized systems.

raytseng · Aug 29, 2012

Not sure why people are comparing a mainframe to a UNIX platform.

Although, a mainframe can do and run all your UNIX-required tasks, so you can repurpose the equipment; if you really need the extra features of a honest to god mainframe, then you need a mainframe (and not just a high-speed UNIX box).

Since computer gaming is so prevalent here, I would offer the analogy to be like desktop gaming versus console gaming.

If you want to compare against *NIX, then you should be looking at IBM's AIX systems or Linux based "iron" which are unix-like.

z is a whole different beast in itself. Imagine The Matrix, that's the type of actual mainframe jobs that they run on mainframes.

Even the watson computer, IBM didn't run on z. they ran it on Linux OS box, and used higher level programming (java etc etc)

tsnor · Aug 29, 2012

[citation][nom]mlopinto2k1[/nom]Wouldn't CUDA destroy the performance of these outdated racks? That is, if you replaced every tray with 4 high end streaming cards. Just sayin. Even the PS3 supercomputer the military made would probably rape this thing.[/citation]

Half of the credit card swipes on the planet can be handled by a SINGLE z server. "Wouldn't CUDA destroy the performance of these outdated racks?" LOL. Depends on the application. http://investor.visa.com/phoenix.zhtml?c=215693&p=irol-newsArticle_print&ID=1355716&highlight=

PreferLinux · Aug 29, 2012

[citation][nom]blazorthon[/nom]You people don't seem to realize that there is no indication of performance in this article, only clock frequency. For all we know it might have the performance of an i3 or the performance of a six-core i7.Also, Bulldozer can clock a lot higher at the same power consumption. Whether or not it would be similar performance, we don't know. If I had to guess, no, but it doesn't matter what we guess if we don't know for sure.Also, guys, IBM probably uses Power and such because they don't have an x86 license and even if they did, they probably wouldn't want to change architectures like that. I'm also quite sure that back when Apple switched, their Motorola Power CPUs were weaker than the x86 CPUs of the time, not stronger.[/citation]
This performs far better than any x86 CPU ever released. Each core can run at least four threads, and would probably compete well with an i5, if not beat it by a long way.

Actually, IBM does have an x86 license.

Pinhedd · Aug 29, 2012

[citation][nom]PreferLinux[/nom]This performs far better than any x86 CPU ever released. Each core can run at least four threads, and would probably compete well with an i5, if not beat it by a long way.Actually, IBM does have an x86 license.[/citation]

not "far better" at all, comparable at best. x86 still wins out on instructions per clock by a very healthy margin. The Power 7 architecture sacrificed out-of-order execution on a per thread basis and made up for it with substantial SMT (think Hyperthreading but with 4 threads rather than 2). Each quad core processor may run 16 threads at once but it still has the execution resources of four cores.

It's foolish to try and compare them on anything other than a Dhrystone / Whetstone basis. Given that IBM doesn't publish benchmarks and prohibits its customers from doing so themselves until the mainframes enter the secondary market this is very difficult to do.

On PCs, the software is designed around the capabilities of the hardware. Outside of the big software giants such as Google, Microsoft, EMC (VMWare), Oracle, SAP, and Redhat, most developers don't have any influence on the design of the hardware that their software runs on. On the other hand, mainframes have hardware that is designed around the requirements of the software which runs on them. Companies such as Redhat (again), IBM, HP, and CA collaborate to provide their customers with both the hardware and the software support necessary to run their business.

Mainframes have dedicated processors handling dedicated tasks. A processor may have reprogrammable microcode which allows it to act as a general purpose CPU at one moment, then a database accelerator, then a cryptographic engine. Processors perform dedicated tasks in a pipeline fashion which is facilitated through extremely highspeed interconnects.

This design allows mainframes to offload system management tasks to dedicated system management processors and allow other processors to spend 100% of their time working on the tasks that they are setup to do and nothing else. Thus, a mainframe can sustain nearly 100% load and still maintain high availability while being completely redundant internally.

Individually the processors aren't all that great, but they're flexible and its this flexibility which allows them to be put into combinations that streamline their purpose.

mayankleoboy1 · Aug 29, 2012

[citation][nom]blazorthon[/nom] Compare Win Zip's compression times on any AMD GPU to say a very high-performance CPU. A good CPU can win every time despite GPUs obviously having exponentially greater aggregate performance.[/citation]

Last time i checked, winzip doesnt actually use the GPU. It only checks for the presence of an AMD GPU or CPU. If yes, then it uses the OpenCL code path. Else it uses the standard code. But both these codes run only on the CPU. GPU use is 0%.

deicided · Aug 29, 2012

This would hurt the wallet alot thinking about how alot of companies charge a license fee per each physical processor.

rexdale_punjabi · Aug 29, 2012

Thing is, a decently binned sb can hit 5.0 ghz as well just probably under water a few at safe voltage but still.

Bulldozer can easily reach 5ghz esp. if u get an 8 core and disable half then it shits on i7s.

palladin9479 · Aug 29, 2012

To expound on what Pinheld said, these CPU's are not for running a single task very fast, their for running hundreds of things simultaneously and rarely with just one CPU.

Ex: I present the Sparc T4 (Since I'm a Sun guy I'll speak about what I know).

40nm 2.85 ~ 3.0 Ghz (slower then Power speed wise)
Eight Cores with 128KB of L2 cache per core and 4MB of shared L3 cache per chip.
Eight threads per core
1 10GbE controller per chip.
Two DDR3 Memory bus's per chip

You get 64 threads per chip which is pretty nice but when combined with other chips your looking at 128 to 256 threads per commodity box. Go with Fujitsu's Sparc VII+ where you can get 32 to 64 sockets and it gets stupidly wide.

Yet running a single threaded task would be comparatively slow on this and Power vs something like a SB (assuming you could create a benchmark that was uArch neutral like what Spec does). Different HW platforms cater to different processing loads, you pick and chose based on what you need done.

IBM z Processors Climb to 5.5 GHz

Distinguished

Distinguished

Distinguished

Distinguished

Honorable

Champion

Honorable

Champion

Distinguished

Glorious

Glorious

Distinguished

Splendid

Honorable

Splendid

Distinguished

Champion

Distinguished

Honorable

Distinguished

Splendid

Share this page