AMD CPU speculation... and expert conjecture

esrever · Apr 2, 2013

I was expecting more from steamroller but who knows, AMD might pull off something amazing. At least the shrink would mean more die area for the GPU in kaveri. At this point Im more interested in seeing jaguar than steamroller.

griptwister · Apr 2, 2013

What AMD needs to do is create a quad core that performs on par or slightly below the i5 4570K with a far better price point, and a 8 core that performs about the same as two i5s put together (Lol, and yes, I know it doesn't work that way) Another thing is if they are able to create a APU that has cores on par with an i5 2500K and strap on decent graphics at a better price point, They would have a hot seller. I think I would probably buy one. I'm curious to see the power consumption on these APUs.

8350rocks · Apr 2, 2013

hcl123 :

8350rocks :

yes yes... but IPC for x86 is 1 to 1.1 or up to 1.2 "instructions per clock" for a little more optimized code. "full fury" optimized code could be much more, yet staying clearly below the 2 instructions per clock per thread [makes one wonder why have CPU cores with more than 2 ALU pipes ... right ?] . And NO ONE in current software offerings, that everybody uses, has "full fury"... not even benchmarks that can be more optimized for a particular uarch than another...

[EDIT: of course in this we are talking about NON VECTOR integer code, which is the traditional and logical way of counting IPC, since architecturally x86 is NOT a "vector" uarch... though good vectorized code could in many cases pass the 2 IPC barrier... ]

Since IPC is talked so much this days, it would be nice to have a "benchmark" that would have code specifically to indicate those numbers ( 1 to 1.2 or so IPC)...so the results would be 0.x to 1.z, or something like that.

Performance depends on so much factors that is hard to beggin with, but with those numbers in mind it would be easy to understand that on x86, no CPU today is "technically or potentially", anything like 150 to 170% IPC wise more performant than another. Not even Haswell compared with Jaguar, that is, 0.9 compared with 1.2 is around ~35% in "potential"... its much more than that, by many other factors, including instructions in the super bloated pointless x86 logical extensions ( like in vector from SSE to AVX), that one chip have natively, while in another must be executed by "microcode" (painful slow)... which might indicate that the "spree to bloat" to newer instruction extensions was/is only to gain "competitive advantage", accounting with this factor( seems obvious no ?)...

High degrees of advantage/speedup only comes with high parallel data code and high parallel execution models (cache and memory and clocks and even logical instructions could never reach that >100% speedups) ... but in here we are NOT talking about CPU but more like GPGPU models...

That is where AMD has an advantage IMO, and it spells APU not CPU. But the apps for this are yet to few and too far in between to make a decisive advantage... "never settle programs" for Multimedia apps could be the salvation of AMD... but doubt those kind will appear as "common benchmark" apps anytime soon...

Well, if you have 4 cores that can run 32 IPC versus 8 cores that can run 44 IPC, I would think, even with less efficiency of code processing, you'll flat out get more operations out of the AMD chip, versus the intel.

palladin9479 · Apr 2, 2013

Well, if you have 4 cores that can run 32 IPC versus 8 cores that can run 44 IPC, I would think, even with less efficiency of code processing, you'll flat out get more operations out of the AMD chip, versus the intel.

He's right on that everyone here has been using "IPC" incorrectly and out of context. IPC is not how fast a CPU is, its an engineering term used to measure the efficiency of the integer processor and used almost exclusively in RISC uArchs. In x86 it has no meaning for one very simple reason. x86 Instructions have variable execution time with most requiring more then one clock cycle to actually execute. RISC design's on the other hand almost always have 1 cycle execution times for all instructions, they is why they use explicit load / store instructions rather then combined instructions. This limit is actually what had many of the original RISC design's beating out the x86 ones for performance (SPARC / PPC / MIPS) until Intel made the Pentium. That CPU did something unique in that it wasn't an x86 CPU, it only emulated one. Internally it was a RISC processor that had a front end x86 decoder / scheduler bolted on to trick the system into thinking it was an x86 CPU. Using this technique Intel was able to put the functional equivalent of multiple processors inside a single logical CPU that could then rapidly execute multiple instructions. The last true x86 CPU was the i486 era CPUs.

That is all important to keep in mind because when your trying to count instructions you need to specify which instructions your wanting to count as their not all the same. AMD and Intel both have different internal microprocessor languages, the real instructions being executed are masked from the OS and so we have no idea what exactly is going on inside (not without access to confidential engineering information). One particular instruction make take longer on one CPU then on another. AMD's BD in particular has a very sensitive cache access method that could easily destroy performance if not handled properly.

Anyhow the take away from all this is to stop using the word "IPC". You can simply say processing efficiency, or single core capacity. Capacity would be the proper term to use as it describes the maximum amount of work able to be done, either by a single unit or by the whole entity. Also do not use clock as a measurement of anything involving two different CPU designs. For the above reasons clock rates have dramatically different impacts depending on the design. The two best metrics are per cost and per energy used as they describe what the user / integrator is paying for that capacity.

mayankleoboy1 · Apr 2, 2013

Or just use times in iTunes as the definition of performance.

palladin9479 · Apr 2, 2013

mayankleoboy1 :

If iTunes is all you did and what you bought the system for, then sure.

If your system has other assigned functions then iTunes would not be a reliable indicator of delivered performance for your money / energy use.

mayankleoboy1 · Apr 2, 2013

^ Read my comment in context.
"iTunes" means a single threaded program, which is at best SSE2 enabled. Any advance in single core performance (due to arch improvement and clock) will directly affect it.

mayankleoboy1 · Apr 2, 2013

:lol:

Trust Charlie to bust bubbles:

http://semiaccurate.com/2013/04/02/arm-and-tsmc-tape-out-20nm-finfet-coretex-a57-cpu/

How does the headline, “TSMC announces 20nm tapeout of rev3 of their 6th test structure die, the 19th and final test chip before 20nm is finalized!” sound?

while the echo chamber of the "tech press" harps on about TSMC releasing "16nm FinFET chips" .

JAYDEEJOHN · Apr 2, 2013

http://blogs.barrons.com/techtraderdaily/2013/04/01/intel-jmp-cuts-to-hold-on-rumored-haswell-power-circuitry-issues/
Hmmm, I hope this isnt true

mayankleoboy1 · Apr 2, 2013

^
I wouldnt be surprised if this was true. Could explain the 1 quarter delay in the launch of Haswell.
Regarding mobile segment, despite excessive hyperbole by Intel, their 32nm on mobile still sucks quite badly. And no OEM is willing to commit to supposedly 22nm goodness.

Truth be told, Haswell doesnt interest me much. I am much more eager to see Richland and Temash products.

JAYDEEJOHN · Apr 2, 2013

But, if its true, is it design or process here?
If its pure design, then OK, still bad, but at least it wont effect others down the road.
If its process.....

mayankleoboy1 · Apr 2, 2013

I think it would be design fault. The article says that the integrated VRM is the culprit. What is quite interesting is that due to the faulty VRM's, the Ultrabooks and ultraportables will still suck a lot, making Ultrabooks not better much. Which will directly impact prestige of Intel, and question its market leadership.

truegenius · Apr 2, 2013

not without access to confidential engineering information

there are many books regarding 8085/8086.......etc cpu (old) arch 😛

Anyhow the take away from all this is to stop using the word "IPC". You can simply say processing efficiency, or single core capacity.

Performance Per Clock sounds better and sounds same as instruction per clock

Also do not use clock as a measurement of anything involving two different CPU designs.

PPC per ghz i.e, PPCPG would be better in compairing performance of different arch
(PPCPG = PPC/clock speed) (a term coined by me 😛 no need to thank me 😛 )
it will be like dimps in arm's arch comparision to give a general idea of what kind of performance difference we can expect at same speed and core count
example : 3.5 for arm cortex-a15, 3.3 for qualcomm krait, 2.5 for cortex-a9 etc
then we only need to multiply this PPCPG with clock speed and core counts to have full capacity of a cpu

The two best metrics are per cost and per energy used as they describe what the user / integrator is paying for that capacity.

PPC per watt
i.e, PPCPW (another term coined by truegenius 😀 no need to thank me again 😛)

Another thing is if they are able to create
a APU that has cores on par with an i5 2500K and strap on decent
graphics at a better price point, They would have a hot seller.

it is too much of an expactation
there is no way that steamy can match sb performance
if steamy can match nehalem in PPCPG (performance per ghz) for equal number of threads then steamy will rock

i expect performance (PPCPG) in range of wolfdale to nehalem

JAYDEEJOHN · Apr 2, 2013

You sound like the old K8 system metric

gamerk316 · Apr 2, 2013

8350rocks :

Are we talking "per core", or cumulative performance? This matters.

If you have a 4 core chip with 32 IPC [total], you get 8 IPC per core.
If you have a 8 core chip with 44 IPC [total], you get 5.5 IPC per core.

If both are clocked at the same speed, the 4-core chip would be faster until 6 or more cores were being used, as its higher IPC per core would pull it ahead performance wise.

And yes Palladin, IPC isn't a flat value, depends on workload, and so on. But at the end of the day, its an approximate measure of how much work the CPU does per tick of the clock. Given enough benchmarks that compare two processors, you can approximate the difference between them.

JAYDEEJOHN · Apr 2, 2013

But, often benchmarks somewhat skews the realtime/realuseage results

sarinaide · Apr 2, 2013

I am very interested to see Kaveri's controller interfaces, the iGPU is not strangled by system memory bandwidth which will likely see much higher bandwidth despite using slower clocked GDDR5. I have invested a lot of time into the APU series and to say they are often overlooked based on stoic sentiments that they are week processors is often simplifying the truth unnecessarily. APU is premised on cost to performance, achieve best results with the lowest price what is certain is;

1) RAM is a factor CPU/Mobo + DDR3 1866/2133 is still cheaper than any other combination with a very competent graphics solution.

2) HD7790/7850 and 660 offered up some seriously good performance so it is not limited to iGPU operations.

3) Dual Graphics is improving just some games dont utilise CFX or SLI so not much can be done about that, at $60 on top of the CPU/Mobo you get around 2-2.5x the iGPU gains worst case around 1.5x the iGPU gains.

Kaveri is still tickling my facination, just want to see how these memory controllers work, by offloading the iGPU to GDDR5 only will likely see at least 2-2.5x the bandwidth gains rather than limited by the x86 IMC, since bandwidth is king it could see the monumental gains needed.

As for steamroller, nobody really knows yet so its basically fud right now.

truegenius · Apr 2, 2013

^ does this new amd apu supports asymmetric cfx ?

if yes then does it support combining different gpus too of different arch
like combining hd6770 with igpu of a12-7800k (whatever the top chip will be), does it support this type of cfx ?

it it does then i can sell my hd6770 at higher price 😀

Given enough benchmarks that compare two processors, you can approximate the difference between them.

cinebench r11.5 is good to get that idea of ipc aka ppc 😗
it does show approx expected performance difference between majority (>50%) of application/game when switching to different cpu
and combine this with some other bench to have more pure performance factor (ppc)

using phenom x2 cpu result of cinebench r11.5 , you can approximate the expected score or full performance of x6 (same arch) and same with clock/overclock and ths score
is compareable to other arch to get an idea of perofoance difference

You sound like the old K8 system metric

me ?

sarinaide · Apr 2, 2013

truegenius :

Ironic that you brought it up, the last news out is yes it supports dual graphics with GCN(or GCN 2) and VLIW parts, so yes the Kaveri APU's can operate with existing Dual Graphics support DDR3 and DDR5 compatibile parts. Not only socket stability but GPU support stability which is great for users who may not be able to pay $60-80 for a HD8600 discrete part.

The HD6770 is not a supported GPU for DG, you need Turks based GPU's and the not yet revealed Sea Islands and Solar System parts.

mayankleoboy1 · Apr 2, 2013

JAYDEEJOHN :

But dont all benchmarks do so ? Isnt why they are called benchmarks ?
You have to draw a line somewhere, and say that so and so usage is as good as real world.

And really, we dont consider Syntherics very important. Generally, real programs like rendering or compression or video conversion give a better idea of the performance.

sarinaide · Apr 2, 2013

mayankleoboy1 :

It mentions 5-10% per core efficiency scheduling increase, along with 30+% iCache miss reductions, 25% less miss predicts, 30% faster thread dispatch, 30% more OPS, faster reworked IMC. Somewhere in the quantum of things lies the true potential number.

gamerk316 · Apr 2, 2013

3) Dual Graphics is improving just some games dont utilise CFX or SLI so not much can be done about that, at $60 on top of the CPU/Mobo you get around 2-2.5x the iGPU gains worst case around 1.5x the iGPU gains.

http://www.pcper.com/reviews/Graphics-Cards/Frame-Rating-Dissected-Full-Details-Capture-based-Graphics-Performance-Testin

AMD CrossFire configurations have a tendency to produce a lot of runt frames, and in many cases nearly perfectly in an alternating pattern. Not only does this mean that frame time variance will be high, but it also tells me that the value of performance gained by of adding a second GPU is completely useless in this case.

I've never viewed dual-weak-GPU solutions as viable due to latency concerns. Its nice to be proven right almost 5 years later...

pcper has already said (via comments) they will look at hybrid CF at some point in the future. My suspicion, based on how the iGPU/APU will be the one that does the final output, is latency would be even worse then the traditional CF case. In any case, I'd wait for those tests before anyone jumps on the hybrid SLI/CF bandwagon.

Cazalan · Apr 2, 2013

Blandge :

Seems this is still not 100% known. AMD said the 28nm chips would be at GF back in 2012 analyst call. Both FABs have had their troubles but apparently GF is now ahead of TSMC for 20nm FinFet. Their working closely with IBM has probably helped.

"AMD obliged during the Q1 2012 analyst call. They said that 28nm APUs would be starting production at Global Foundries this year, so that means product on the shelves in early 2013."

Now this year we know at least Kabini/Kemash are being made at TSMC. I though Kaveri was as well but now it looks like GF will still be doing their higher clock speed chips.

sarinaide · Apr 2, 2013

That is bringing a complexity that is not in issue here, the issue is does dual graphics enhance gaming experience and it is a resounding yes where the game scales to dual graphics support.

The HD7660D playing BF3 @ 1080 Low settings achieves on average around 28-32 FPS, bear in mind while FRAPS will be superceded at some point it still remains a viable test until it is replace by something better. Going onto a HD6670 @ 1080 Low settings it gets around 42-47FPS, in Dual Graphics mode you can get as high as 70FPS, on med-high settings with low AA and AF you can get very playable 40+ FPS on Dual Graphics, now the advantage here is APU+MOBO+HD6670 and I will be fair and base this on higher end APU specs available that is a 5800k + Extreme 6 + HD6670 1GB GDDR5 for anywhere from $240-300 and get more than double your frame rates at low cost is still a fantastic feature.

Sure there will be issues, IE some games don't scale more than 5-10% higher than the 6670 on its own but the APU and the way AMD views it is not just life in a fish bowl, this is progressive technology as it evolves so will its scaling and performance. If and I hope Kaveri hits the 60+% gains over Trinity which will represent a truely sophisticated integrated solution and a very impressive one to boot.

As above the APU represents balance, it allows a system that is flexible and adequate for casual and entry level gamer experiences by balancing price to performance and the relevent part tweaking to assemble a very competent multi purpose solution. Year is the flexibility.

1) iGPU only. As we have seen from Xbit every Intel iGPU solution including what will come from Haswell is grossly inefficient, and since a i7 is not in the same price group we have limited this to i3's and scaling down sees somewhat dramatic fall offs. So is Intel's solution adequate enough to take this, depends on your needs and what the APU can achieve is more flexibilty in that its not only a x86 part with just enough multimedia and HTPC/Gaming support, this part can game and game well, AMD quick stream and other A series features and that its on SFF make it good at low cost.

2) Discrete, keeping things realistic in the price bracket we ran it with the new 7790 and a 660 and it got Crysis 3 to High Settings at 1080 and the experience was very fluid, as a gaming part it has game despite not being the best option for discrete only.

3) Dual Graphics, low cost feature which delivers sizeable gains on a modest budget, factoring in lucid support you get a pretty amazing experience resting at night knowing you dont have to eat bake beans for month as you are overspent.

de5_Roy · Apr 2, 2013

gamerk316 :

this is the latest bench on amd dual gfx that i could find:
http://www.tomshardware.com/reviews/silent-pc-gaming-performance,3435-14.html
these were done with silent cards, so i guess there might have been some thermal limitations. but the performance regressions can be entirely attributed to drivers/catalyst profiles. driver performance does not trickle all the way down to entry level.
starting with a dual gfx setup is a pretty bad idea unless you own a dual gfx compatible discreet card while building an entry level apu-based pc and cannot buy a radeon hd 7750. that is a very specific scenario. dual gfx is more suitable for laptops(especially alongside good dynamic gfx switching), not desktops.
i heard claims about trinity being able to dual gfx with radeon hd 77xx cards and perform on 7850 levels and such. now i'm hearing similar claims about richland and kaveri performing even higher. time will tell.

sarinaide :

then how about this one:
http://www.xbitlabs.com/images/cpu/trinity-vs-ivy-bridge/f12012-2.png
if core i3 3225 can do that, who knows how hard haswell will hit... i hear there's a new driver out boosting 10% moar... ;D
don't blame me, i see techspot's bf3 bench being posted here all the time. :ange:

😗

AMD CPU speculation... and expert conjecture

Splendid

Distinguished

Distinguished

Splendid

Distinguished

Splendid

Distinguished

Distinguished

Champion

Distinguished

Champion

Distinguished

Distinguished

Champion

Glorious

Champion

Splendid

Distinguished

Splendid

Distinguished

Splendid

Glorious

Distinguished

Splendid

Splendid

Share this page