AMD CPU speculation... and expert conjecture

juanrga · Sep 20, 2013

hcl123 :

Yes, we are talking about heterogeneous supercomputers: the Mont Blanc project uses ARM+CUDA.

The CPU in current x86 + (GPU/accelerators) supercomputers accounts up to a 40% of the total power consumption . Substituting the x86 CPUs by more efficient ARM CPUs is a needed step to scale up.

A note: Nvidia Parker APU will use custom cores more advanced than A57 and it will be made in a FinFet process. The node size is not known, probably 14nm, but some rumours say 16nm whereas other say 10nm.

Cazalan :

It says: "ARM multicores as efficient as Intel at the same frequency"... using ancient Tegra 3 (the supercomputer will use Tegra 6).

The tegra 3 phone chip is offering the same efficiency, or better, up to its maximum freq of 1.3Ghz, whereas the Intel chip (which is not a phone chip) can continue up to 2.4GHz, among other reasons, because its cores are being feed by about 4x more memory bandwidth and 7x more cache size. There is no reason which you cannot clock an ARM core up to 3.5GHz, this is unrelated to x86 vs ARM, but simply the rest of the chip has to be designed accordingly.

Look to the single core efficiency: "ARM platforms more energy-efficient than Intel platform". A single core is being feed accordingly.

If the tests are repeated, but disabling the extra 6MB cache of the i7, and limiting it to a single memory channel, the i7 will perform much poor, whereas consuming about the same energy. I think this would offer a better perspective of the efficiency of ARM vs x86.

About overclocking. Again this is unrelated to ARM vs x86. Pay also attention to the "Thermal package not designed for sustained full-power operation".

You are right, I was comparing CPU improvements with CPU+GPU improvements, my mistake. Still my point holds; CPU improvement for Intel was of about 10%--15% between Sandy and Ivy and then dropped to about 5% for Haswell.

Tegra 3 CPU is about 2x faster than Tegra 2 CPU. Now look how Exynos (dual A15) offers the same performance than Tegra 3 (quad A9) clock for clock. This implies A15 is about 2x faster than A9. Look to the new Apple A7:

Apple doesn't quite hit the 2x increase in CPU performance here, but it's very close at a 75% perf increase compared to the iPhone 5.

About MIPS, I think that what you say is not correct, but the question here is not one RISC vs another RISC, but RISC vs x86 (CISC).

juanrga · Sep 20, 2013

8350rocks :

Then CEO saying that they will focus on mobile doesn't mean anything?

GOM3RPLY3R :

«Even though it uses "New Hardware," It's all custom, so it wont be true hardware.» True vs. untrue hardware LOL

juanrga · Sep 20, 2013

hafijur :

http://www.amazon.co.uk/review/RQJ2AQTSH86WV/ref=cm_cr_pr_cmt?ie=UTF8&ASIN=B009O7YUF6&linkCode=&nodeID=&tag=#wasThisHelpful

"After testing I've found I can run the 8350 at stock speed but under-volted by 0.050v and it remains solid on Prime95 (small SSTs). What's odd is the power usage: it's a really hungry beast. At stock voltage and 100% load the draw at the wall is 290W. Somewhere around 180W must be going through the CPU socket! So the quoted TDP of 125W is highly questionable. Under-volting it saves 30W. It's not hard to cool though; with both the case fan and CPU fan cruising quietly at 1/2 speed and under-volted it runs Prime95 at 46C which is very comfortable.

The 8350 supports the GTX 660s very well in running games.

BUT, now I've created an Excel benchmark test macro. The FX-8350's performance is hopeless. It's even slower than my 5 year old Intel QX6700 which runs at 2.7Ghz. It's slower than my daughter's Lenovo E320 laptop (i3-2330M @2.2GHz). And it's about half the speed of an i5-3570K @3.8Ghz. Since this is the most important use for me I'm going to replace it. Be warned! "
...............................................................................
This proves my point intels ipc is much better. According to superpi the fx8350 is on par with a cpu at 2.4ghz from the core 2 era. Now intel currently are twice as fast ar superpi or image editing with gimp or fotosizer. The i5 at 3.8ghz should be 95% faster then the fx8350 at this task.

It only proves that anyone can write a 'review' at Amazon and then someone can quote it for further nonsense in a forum.

juanrga · Sep 20, 2013

hafijur :

«That's 2.2x the relative performance of the FX-8350, up at the lower end of the K processors, using about 13 watts. Amazing.» What is amazing is that someone can write that.

esrever · Sep 20, 2013

GOM3RPLY3R :

So in other words, very cheap $3000 level PC? Excuse my Czech, but I call BS. If that's the case, on the scale of the hardware, the console should be ~$4000 if not more.

Okay maybe yeah, the SPECS say that it's "That Good," but will it be "That Good?" No. By any means, there's no way that consoles are THAT GOOD. Please understand this.

Even though it uses "New Hardware," It's all custom, so it wont be true hardware. Kinda like a GTX 680M is not a 680, at all, in any way shape or form, and it take on the power of ~ a GTX 650 Ti - GTX 660, however for this scale (from these "fantastic specs"), the ratio will by MUCH larger.

EDIT NOTE: By the way, I was in no way saying it was that verbatim. "It has the performance of a 6670," is much different than, "it is a 6670."

No.

noob2222 · Sep 21, 2013

juanrga :

Wow .. more utter nonsense

http://www.tomshardware.com/reviews/ivy-bridge-benchmark-core-i7-3770k,3181-24.html

IT NEVER EVEN HIT 10% ...

de5_Roy · Sep 21, 2013

8350rocks :

it was more successful than amd's own zambezi(what isn't? lol) and is more successful than all 6-8 core cpus. but it's nowhere near sb/ivb core i3, in oems. amd's apu success is majorly due to zacate and bobcat and later llano. amd failed to make trinity capitalize on that(edit1: i don't mean like 'oh noes! apus going down!'. trinity's success is similar to llano's level but at a later time when intel also improved slightly). it's weaker capability in lower power envelop is one of the reasons. the other, less discussed reason is that intel might have done a little something to put pressure on trinity (i didn't quite understand how they did it, just oems preferred intel. iirc it had something to do with price change.) adoption. i know to up to the level where i observed people buying core i3 and i5 machines (laptops, mostly) despite those having underperforming hardware (dual core, weak@$$ igpus etc) over the apus. worse, i noticed that trinity laotops were usually bundled with single dimm ddr3 1333 rams.
as for marketshare, kabini is their best bet. it's better than even zacate, i'll say. it's up to amd to get it done and done right unlike in the past.

hcl123 :

ahh. i get what you mean. in that case, intel's designs look pretty tightly integrated to me. i think silvermont may be different.

hcl123 :

their philosophy is what drew me towards zambezi (and i got burned

). i could see the potential for high performance. but the reality was different. besides, zambezi and vishera looked transistor-hungry to me. i mean, there's just the imc and the cores inside, what are the rest of the transistors doing?

hcl123 :

their mobile uarches seem good.. for business, not performance or technology. i could wish a lot of things the technological advancements would bring, but these guys seem more keen on making money (management-driven).
yeah, somewhat like that. what intel's taking is from their own closed market and they flat out refuse to go anywhere. even their own so called foundry business is solely to benefit their ecosystem. since said ecosystem is shrinking fast, i think of intel being a big fish in a pond (formerly a lake) that's being slowly filled up and it's second big fish is growing wings and getting ready to move on/fly away. please don't be creeped out by weird biological analogy. i apologise in advance. :ange:

juanrga :

what a load of loaded pile of bull (steaming gets added next year geddit?😛). the phones, tablets, servers, hpc all use os (i mean a workable environment) based on the kernel. how many users can directly use the kernel without a user-friendly gui? i am newbie about command line and i had a hard time navigating ubuntu and fedora command line interface. the linux forums didn't help much. from what i understand about these, kernel is like a middleware and it's not high level. it's closer to the machine than it is to the user. ps4 os, ios, osx all of them thrive because of the user interface and the support. otherwise everyone woulda use the middleware. edit 2: the kernel is free to use, it lessens development cost significantly, enabling software vendors to focus on support.
the vendors that use linux kernel, pay coders to build user-friendly gui and other supports so that general populace can use them without frustration. <- that's what companies pay coders for, for the support service.

de5_Roy · Sep 21, 2013

moar tech drama before amd's big announcement! twists! revelations! who shot jk!(no one shot jk, nvm)!!! rage!
http://techreport.com/blog/25399/here-why-the-crossfire-eyefinity-4k-story-matters
that last promo slide made me laugh. :lol:

hcl123 · Sep 21, 2013

They talk crossfire as if its going to stay centered in AFR (alternate frame rendering)... If hUMA/HSA is any good as a hint, tiling/supertiling across different rendering processor boundaries is quite facilitated. hUMA/HSA is already crossfire.

Thats is why i don't see APU yet to replace the all high end (FX), 512sp or even the double is not good enough for all the future needs, unless like intel they want to kill the the DT, for a market where x86 in doomed in the long run ... but if they don't want to kill it, then a GPU at 20nm will have easy >4000 sp...

( Radeon: 4 raster engines, as with Tahiti 4 blocks of 4 CU for each raster, makes 16x4(rasters) = 64 CU = 4096sp <400mm² (easier to get X2 and >8000 sp) ; GeForce: 6 raster (from 5 of today titan), 4 SMx(192sp each) from today 3 for each raster, makes 6x4x192 =4608sp < 500mm²)

umm... all those issues will be addressed in next year GPUs.. more than addressed... and how ridiculous 512 sp seems, to the point putting a top 2014 Radeon with an APU, better turn the APU GPU off.

No AMD needs FX... and if smart HTX+PCIe combo slots (its in patents), hUMA (need cache coherency and right now only HTT tech can provide)/HSA and natural crossfire with discrete parts will be accomplished then.

juanrga · Sep 21, 2013

noob2222 :

LOL and more FUN.

First, that is only for "clock for clock".

Second, they didn't got a 8% in those tests, doesn't imply other tests don't exist.

de5_Roy :

LOL "how many users can directly use the kernel without a user-friendly gui?" You are kidding true? Because the other option would be that you don't understand the difference between a kernel, a CUI (or CLI), and a GUI.

This reinforces my belief that average joe is completely ignorant about linux.

Pay attention to what I wrote, because I didn't mention "kernel" I was talking about linux _distros_ and _OSs_. E.g. the SteamBox will use a linux _distro_ . FreeBSD is a _OS_, not a kernel and so on.

And the user friendly GUIs were not invented by Microsoft despite many people think so.

GOM3RPLY3R · Sep 21, 2013

juanrga :

8350rocks :

Then CEO saying that they will focus on mobile doesn't mean anything?

GOM3RPLY3R :

«Even though it uses "New Hardware," It's all custom, so it wont be true hardware.» True vs. untrue hardware LOL

Wow you must need some help. True hardware meaning: They are not going to throw a real (true) AMD card in the console. The card in the console is a insanely modded one that has no where near similar performance to the real card.

The Q6660 Inside · Sep 21, 2013

GOM3RPLY3R :

juanrga :

Wow you must need some help. True hardware meaning: They are not going to throw a real (true) AMD card in the console. The card in the console is a insanely modded one that has no where near similar performance to the real card.

Tell me, which is the better GPU? Speaking of which, about your GTX 680M analogy, a 680M is literally a 670 mashed into an MXM package. AMD can pull off putting 7860-ish GPU levels in the PS4's custom solution.

or

de5_Roy · Sep 21, 2013

one more speculation.
looking back, when intel got outdone by amd in the past and faced problems, they started evolving their mobile-focussed(laptop) uarch, right? sandy bridge, ivy bridge, haswell are all results of that.
meanwhile, amd was developing a performance, desktop/server oriented uarch with bd. i think that may have been the mistake. whoever gave amd engineers the goal was wrong. edit: a top-down approach like that, to aim for high revenue market from the start, was way too risky. zambezi's pricing reflected that. if bd was developed as a mobile(laptop) uarch along with modular design, i think amd woulda been in a much, much better place money-wise. this is why i think jaguar is so good and why amd should evolve jaguar like intel did with uh.. conroe was it? amd half-assed with trinity to slap together a clocked down dt core with vliw4 igpu...not cool. the result wasn't bad. but it coulda been so, so much better if you try to imagine the potential. i want excavator to be like that or whatever ends up getting the excavator title.
intel seems to be already grooming silvermont and it's successors to be their next major uarch after haswell topped out. <- this is not a m.i.l.f. bait. i repeat, this is not a m.i.l.f. bait. it's a speculation about future amd cpus.

GOM3RPLY3R · Sep 21, 2013

The Q6660 Inside :

GOM3RPLY3R :

Tell me, which is the better GPU? Speaking of which, about your GTX 680M analogy, a 680M is literally a 670 mashed into an MXM package. AMD can pull off putting 7860-ish GPU levels in the PS4's custom solution.

or

From one angle it looks great, but from all angles (reality), it's not going to be amazing like everyone thinks it will be. Again, just because it says so doesn't mean it is. There's no point in putting your expectations so high.

etayorius · Sep 21, 2013

AMD Posted the following on Facebok:

*AMD Gaming
Have you noticed the @AMDFX tweets about September 23? We’re two days away from celebrating one of the most successful…*

It may be related to the Athlon 64 which was released of September 23 2003... could it be...

The Q6660 Inside · Sep 21, 2013

etayorius :

Perhaps we will have a Steamroller FX announcement on Sept 23 😀
AMD Athlon 64 X8 FX-85+ 😛 *Thinks of 3800+ 7900GT SLI AN8-32X memories*

montosaurous · Sep 21, 2013

Steamroller should be interesting, but personally I'm far more concerned with Volcanic Islands. Getting kind of tired of my Radeon 7770, and I just got a job. Of course there are many other things I'd like to invest in, but a new GPU is first on my list. Hopefully they won't be too overpriced...

The Q6660 Inside · Sep 21, 2013

http://wccftech.com/rumor-amd-hawaii-gpu-r9-290x-volcanic-islands-pcb-leaked-512bit-interface-massive-die-size/

AMD HQ live feed: http://www.youtube.com/watch?feature=player_detailpage&v=9yOEOJ1BARk

griptwister · Sep 21, 2013

SWEET Gibus! (TF2 Reference.) If this is true... the Titan has met its maker.

http://www.overclockarena.com/amd-radeon-r9-290x-with-hawaii-gpu-pictured-has-512-bit-4gb-memory/

On another note: Sept 23rd huh? I can only hope...

**edit** Well played Q6660 Inside! Well played in deed. Haha!

The Q6660 Inside · Sep 21, 2013

griptwister :

Radeon is back to over 9000, you know what this means. This may be the only truly great graphics card since the 5870 and GTX 480.

http://www.youtube.com/watch?feature=player_detailpage&v=WOVjZqC1AE4

Cazalan · Sep 21, 2013

juanrga :

Actually there is a reason you can't clock them that fast, it wasn't architected to run that fast. They will need to add more pipeline stages to do 3.5Ghz, and the IPC will take a hit.

juanrga :

Again with the crippling? Yes if you cripple the architecture by cutting off caches and RAM it will perform worse. Is that not obvious to everyone? What is the point of even making statements like that?

juanrga :

Understood, but that's a challenge for Mont-blanc not Intel or NVidia. They're the ones that wanted to take a "cheap" cell phone processor and turn it into a supercomputer. They will have to do the extra work to make sure the package can handle the thermals, or contract with NVidia for a custom and much more expensive version of the chip. Which NVidia is unlikely to do for under 1 million parts. Just not worth their time to do so.

juanrga :

You're implying that ARM is just going to keep scaling forever which is just not possible. Go back a generation (ARMv6) and ARM was an in-order processor. ARMv7 went to out-of-order processing which got them into a more modern era of computing but still pre-2000s era. ARMv8 adds 64bit processing which finally gets them into server capability.

But where do they go from there? What is the next big leap for ARM now that they have played all the BIG tricks that AMD/Intel used a decade ago?

We already know mostly what they'll have to do. More transistors, larger caches, higher clock speeds, longer pipelines, more ALU, wider memory interfaces. All these things ADD to power consumption. There is no free lunch here.

A lot of ARMs success is they've had a 10 year technology road-map laid out in front of them. They're leveraging the pain sweat and tears of Intel/AMD/IBM/Oracle, which have actually pioneered technology.

Hindsight is 20/20 right? What will happen when ARM has to do some of their own innovation? They're actually a pretty small company of 2500 employees, of which probably half are lawyers/sales.

juanrga :

Yes great for Apple. Going 64bit has it's payoffs which we've all enjoyed for the last decade in x86-64 land. I was impressed that Apple actually got the whole of iOS7 ported to 64bit so quickly when Microsoft took years to do that.

You're also looking at a 1 billion transistor SoC. That's a big departure for ARM cores which are usually touted for being smaller than x86.

http://en.wikipedia.org/wiki/Transistor_count

That part is in the transistor count realm of a 6 core Opteron 2400 or a 4 core Sandy Bridge i7-2700K.

Now if they wanted to double the speed and go to a 4 core version they would have to go to 2 billion transistors. So much for small ARM chips. More transistors hurts yields and brings prices up. Just like Intel/AMD they're going to be constrained by process technology and thermals.

juanrga :

RISC or CISC doesn't really matter. It's all about performance/watt. Intel has had RISC like cores since the Pentium Pro and the introduction of micro-ops.

hcl123 · Sep 21, 2013

griptwister :

512bit memory interface is only part of the issue... remains to be seen if it has Quad raster engines ( Titan have 5)

http://www.3dcenter.org/news/erste-spezifikationen-zu-amds-hawaii-grafikchip

It then it will make something like a dual PS4

http://translate.google.com/translate?depth=1&hl=pt-BR&rurl=translate.google.com&sl=ja&tl=en&u=http://pc.watch.impress.co.jp/img/pcw/docs/604/107/html/19.jpg.html&sandbox=0&usg=ALkJrhhf8kyaaZM0-ZgBs0k0dOHTH9UvEA

Between 2304sp up to 3072 sp ( 3 CU groups like PS4 per raster, at lest 3 CU in each group(2304) to 4 CU in each group (3072)).

Hawaii vs Tahiti : 50% more CU groups matches the layout of PS4, if GDDR5 6500 50% more bandwidth, if 48ROP then 50% more ROP, if 3072sp then 50% more sp and TMU...

If they could fit all that in ~430mm² then its a really amazing job of optimization, though i believe something more like 440mm² (its not square)... performance should be > 50% over Tahiti , its GCN2, and the augment in structures points to that ( a full blown Hawaii should be at least 20 to 30% above Titan)

Cazalan · Sep 21, 2013

hcl123 :

I got confirmation recently from an AMD fellow that Kaveri is on bulk not SOI. It was already publicly stated anyway, so it's not a breach of confidence.

That doesn't preclude other SR from being on SOI as the contact is in the APU division not the server division. Same cores but different design teams.

hcl123 · Sep 21, 2013

The Steamroller i was referencing was not Kaveri... but a probably 5M/10C, with the same 28nm half node compared to bulk that GloFo is using for FD-SOI litho. It could be FD-SOI, but it could also be PD-SOI and use the same lithography ( ~35 to 40% smaller than 32nm)... and given the recent uter surprise of 22nm PD-SOI for a 650mm² monster at 4Ghz(IBM Power 8 )(edt)... i would prefer PD-SOI lol ( 5M/10C SR FX/Server would be less than 300mm², even with 3 channels of DRAM-> obviously could never fit AM3(+), not even server C32, so most probably only 2H14.. if not 15)

jdwii · Sep 21, 2013

Looks like a decent diffrence if the numbers are true at least we will be able to see the performance of steamroller this year
http://wccftech.com/amd-desktop-kaveri-apu-13-cus-enabled-radeon-r5-m200-832-stream-processors-gpu-spotted/

AMD CPU speculation... and expert conjecture

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Splendid

Splendid

Honorable

Distinguished

Honorable

Honorable

Splendid

Honorable

Honorable

Honorable

Honorable

Honorable

Distinguished

Honorable

Distinguished

Honorable

Distinguished

Honorable

Splendid

Share this page