AMD CPU speculation... and expert conjecture

griptwister · Aug 19, 2013

So wait... you mean to tell me 8350rocks, on multiple threads, was right about a FX 6300/6350 being near i5 level performance? Well, shoot then. (Lol, This guy seems credible to me)

Also, if you look at the benchmarks that have been posted with the FX 6350 in it, I think we see how close this CPU game is about to get, Intel should be slightly worried on the enthusiast side of things. Esp if SteamRoller OC's good.

And some of you might want to check this out...

http://www.forbes.com/sites/jasonevangelho/2013/08/13/can-amds-newest-apu-play-your-favorite-games-without-a-dedicated-graphics-card-part-1/

juanrga · Aug 19, 2013

hcl123 :

No. 8 FLOP (Double precision) per module using FMA4. Single precision is 16 FLOP per module = 8 FLOP per core.

hcl123 :

Good to know that I only needed to repeat you three or four times that 1050 GFLOP is single precision. Any bet I will need to say this to you again? LOL

Yes, I predicted the frequencies of 4Ghz for the CPU and 0.9Ghz for the GPU. If you have other frequencies that fit with the data we know, share them with us. No nonsense please.

hcl123 :

At contrary, we have concrete info about Berlin, including total gflop per-watt compared to Opteron.

hcl123 :

No. 4SP can be obtained with both FMA4 or SSE2. The difference is that one set provides 2 ops and the other only provides 1 op. It is evident what set is using the above computation. unless you confound both.

You are again confounding modules with cores. Kaveri APU comes with 2C and 4C configurations. It is the possibility of 6C which was droped.

hcl123 :

Yes, Kaveri maximum is 2M with its 2 FlexFPU. No, the rate is 128 GFLOP. It is one half in your fantasy. In fact trinity already have 121.6 GFLOP because has a freq. of 3.8Ghz instead of 4.0Ghz

i__src66e13e53113e26c120439c9dd826eabd_paraf0d99c20bd457d46a92c72841873c47.jpeg

hcl123 :

LOL Integer cores doing FP!!!

hcl123 :

You are trying hard to mess it all. This is a typical tactic to hide errors, but is not working.

hcl123 :

No. In fact the 2 ops provided by FMA4 are of the kind a*b+c.

hcl123 :

It was already remarked before that the FP capabilities are not changed in Steamroller.

hcl123 :

See, my bet I would repeat to you again that 1050 GFLOP is Single Precision is confirmed.

AMD said to us that 1050 GFLOP is for the whole APU. Everyone at Earth knows that, except you. You continue living in a parallel universe or alternative reality where 1050 is only for the GPU.

hcl123 :

See, another example of that you don't read. The 448 GFLOP are for the CPU. I have said that are for the CPU. I have shown you the math. And still you pretend to blur all by pretending that 448 is for the GPU Moreover, there is not Iris in the 4770k. LOL

Ags1 · Aug 19, 2013

griptwister :

Well, I hope AMD bring out 6/8 core parts in their next generation (and not just 4-core APUs)... Notably, the FX6350 scored best in the most recent games like Crysis 3 - games are defintely heading in the AMD more-cores direction. So if AMD stick with the architecture, and improve efficiency a bit, they could clean up with their next iteration.

noob2222 · Aug 19, 2013

rmpumper :

really? your arguement is a locked multiplier low 2.7ghz 1045T ... good luck getting that anywhere close to 4.0 ghz.

gamerk316 · Aug 19, 2013

Ummm...wowzers if true:

http://www.extremetech.com/computing/164209-windows-8-banned-by-worlds-top-benchmarking-and-overclocking-site

tl;dr: MSFT messed up the Real Time Clock in Win 8; if you adjust BLCK in software, Windows can't properly keep time. And since benchmarks use the RTC to figure out how long they ran, HWBot is disqualifying Win 8 from all its benchmarks.

This is significant in impact: Any benchmark taken under Windows 8 is unreliable if this is true.

So...back to trusty Win 7 x64 for benchmarking I guess...

hcl123 · Aug 19, 2013

GOM3RPLY3R :

Terrific!... then there is no need to upgrade, on what we can call a real benchmark, Hasfail ... err Haswell.. is only 5 % to 7 % better, and this with FMA( of AVX 2), without it it "should" be notoriously less. FMA(3) of AVX 2 brings barely any noticeable improvement.

Funny i've seen benchmarks of AMD with FMA4 over 20% improvement :sarcastic:

... the "should" above ? ummm ...or perhaps not funny neither sarcastic, does intel have any real FMAC pipe ? ... is the design really prepared or obsolete for those real performance booster kind of code ( the same IB and past SNB and earlier) ?

http://www.realworldtech.com/forum/?threadid=135386&curpostid=135386
[deserves a big LOL]

The Q6660 Inside · Aug 19, 2013

gamerk316 :

Well, doesn't everybody OC via BIOS these days? Oh well...

griptwister · Aug 19, 2013

gamerk316 :

Win7 ftw! I just disabled Areo, and I'm loving it.

I recently saw a vid from Linus where he benched Crysis 3 in win 8 with a FX 8350 and said the minimum on the FX 8350 was better (31 FPS or something like that) compared to windows 7 (21 FPS) and the FX 8350 pretty much matched the i5.

@hcl123, LOL!

The Q6660 Inside · Aug 19, 2013

griptwister :

gamerk316 :

Win7 ftw! I just disabled Areo, and I'm loving it.

I recently saw a vid from Linus where he benched Crysis 3 in win 8 with a FX 8350 and said the minimum on the FX 8350 was better (31 FPS or something like that) compared to windows 7 (21 FPS) and the FX 8350 pretty much matched the i5.

@hcl123, LOL!

+1, still rocking Windows 7 Pro with Ubuntu 13.04 dual booted.

hcl123 · Aug 19, 2013

juanrga :

No SSE like AVX1 has up to 2 operands in the same vector instruction, AVX 2 have up to 3 (FMA3) and XOP have up to 4 (FMA4)... the "numbers" that those operations operate in the context of SIMD, is dependent of the "wide" of the vector, but be assured the number of "numbers" is not the most useful ( SSE instructions can have up to 16 byte numbers or 8 16bit short integers "numbers" per vector), but what those operations can do is ... wikipedia it...

You are confounding modules with cores is you...

juanrga :

Idiot use your brain. That "CHART" is for a Llano. Integer cluster/cores on AMD BD don't do FLOPS, or are you not sure ?

Here is the diagram of K10

https://pt.wikipedia.org/wiki/AMD_K10
https://upload.wikimedia.org/wikipedia/commons/thumb/d/d6/AMD_K10_Arch.svg/300px-AMD_K10_Arch.svg.png

And here are the instructions

http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

Llano have 2 128bit pipes capable of SSE (up to 4.1) per core, while BD only have 2 128bit pipes per 2 cluster/cores... this in the context of FLOPS or FP ops ... And while a SSE instruction can have up to 4x 32bit FP "numbers" its only on the context of SIMD, same instruction doing the same simple operations on those numbers, its not that terrible useful. SSE doesn't have multiply accumulate, in terms of pure logic operations is only one at a time, a mul or an add. And though theoretically Llano has double the FLOP rate of BD, because of 2 128pipes per core against 2 per 2 cores (module) of BD, is not because of that performance is better than BD. The contrary, most most of times is worst. ( the FMISC in Llano K10 is identical to one of the MMX pipes in BD, and the other MMX pipe in BD also doesn't do FP ops )

So theoretically peak rates most of times can be quite deceiving... only to entertain morons.

SSE floating point

Floating point instructions

* Memory-to-register/register-to-memory/register-to-register data movement
--- Scalar– MOVSS
--- Packed – MOVAPS, MOVUPS, MOVLPS, MOVHPS, MOVLHPS, MOVHLPS
* Arithmetic
--- Scalar – ADDSS, SUBSS, MULSS, DIVSS, RCPSS, SQRTSS, MAXSS, MINSS, RSQRTSS
--- Packed – ADDPS, SUBPS, MULPS, DIVPS, RCPPS, SQRTPS, MAXPS, MINPS, RSQRTPS
* Compare
--- Scalar – CMPSS, COMISS, UCOMISS
--- Packed – CMPPS
* Data shuffle and unpacking
--- Packed – SHUFPS, UNPCKHPS, UNPCKLPS
* Data-type conversion
--- Scalar – CVTSI2SS, CVTSS2SI, CVTTSS2SI
--- Packed – CVTPI2PS, CVTPS2PI, CVTTPS2PI
* Bitwise logical operations
--- Packed – ANDPS, ORPS, XORPS, ANDNPS (edt)

So has you can see (i hope) 128 GFLOPS at 4 Ghz is only true for a Llano and in the context of not very useful "single precision"... BD rate is half per core... but i wouldn't count to much on them(SSE), FMA4 of BD can provide quite better performance though the theoretical peak is half, more so because BD has the same number of L/S engines of Llano , but quite more advanced, and in real world performance counting real sustained data rates for those SIMDs, BD delivers a beating. Got it now ?

[ i like more to count only real logic ops per vector, so SSE 2, AVX is 2 or 3(FMA) and XOP is 4(FMA)... and count the wide of the issue ports... in that case BD can do 4 ops per 128bit pipe but only with 128bit FMA4... and in this context that Llano chart is very wrong ( the "numbers" in those vectors is not quite that useful)... but believe me THERE IS MORE THAN ONE WAY TO EXTRAPOLATE PEAK FLOP RATES, different vendors use slightly different methods with what they count... YOU ARE WASTING YOUR TIME... and mine... (edt)]

Sorry to be aggressive but this is the last time... don't bother anymore.

juanrga · Aug 19, 2013

8350rocks :

This is all entirely feasible, but there is an alternative explanation.

1) Yes, Kaveri is ready.

2) According to some sources, GloFo has improved the bulk process and the differences with SOI are not so large now. AMD adds:

There are always tradeoff decisions. But the flexibility that we gain moving in that direction... the flexibility across foundry partners, across design tools outhweigh that by far, the benefits of SOI.

3) I am not so sure about timing. According to Glofo, volume production for 28nm SOI starts in 1H 2014. According to AMD, Kaveri will be shipped in 2013 to OEMs and to end-users in 1Q 2014.

AMD vicepresident claims that Kaveri was ready for its original early 2013 launch, and that the delay is due to "HSA's marketing not still ready". I interpret this phrase as saying us that the reason for the delay is that AMD want more HSA software ready at launch.

griptwister :

According to my own estimations the Kaveri CPU (4 Steamroller cores) will be near the performance of the FX-6300. More concretely, I obtain a 97% of the performance 🙂

Ags1 :

Both Berlin and Warsaw probably confirm that there is no 6/8 Steamroller CPUs/APUs. Kaveri comes only as dual/quad.

juanrga · Aug 19, 2013

hcl123 :

No. "op" and "ops" do not mean operands, but operations. Learn the terminology before posting more nonsense.

hcl123 :

LOL. You are embarrassing yourself more and more....

Good attempt to eliminate the image tags from my message to try to hide your ignorance, but I reintroduced the image tags. The A10-5800k is not Llano, it is Trinity, which is based in Piledriver.

The integer cores don't do FLOPs, Floating Operations are made in the FPU. You were corrected about this before.

Trinity A10 (Piledriver) has 121.6 GFLOP (SP) for the CPU.
Kaveri A10 (Steamrolloer) has 128 GFLOP (SP) for the CPU.

Congrats!! You achieved hafidup levels of insanity. Now go away with your nonsense, trolling, and insults.

Krnt · Aug 19, 2013

I bet Kaveri is going to be bulk since its not being focused for High end, and we will only see SR with over 2 modules at the end of 2014 maybe in SOI for better performance and clockrates.

8350rocks · Aug 19, 2013

I don't see any way that clockspeeds will hold at Vishera/Richland levels going to bulk from PD-SOI. The thermal issues alone are going to make it suffer...we'll see Intel-esque clock speeds if that's actually the case...as Intel has a massive process advantage on bulk substrate. Going away from SOI means giving up any hope they had of Steamroller actually surpassing IB.

FD-SOI is the *only* way that SR will catch IB performance levels...because they could maintain the clock speeds. Look at Jaguar architecture on bulk...anything over 3.0 GHz is a massive headache to engineer, so they simply don't do it. I simply don't see bulk substrate being in the best interests of AMD, and I would be extremely shocked if kaveri launches on bulk. The only way they could even get *close* to the performance would be to go to a FinFET on bulk, and that would be *costlier* than planar FD-SOI for them.

8350rocks · Aug 19, 2013

gamerk316 :

Wow, that's interesting...I used a NB OC on my 8350 to hit 4.4 GHz...guess my PC would be way off if I ran win8.

I hated that OS anyway...still on Win7 64

Cazalan · Aug 19, 2013

juanrga :

That's pure marketing BS. Chicken-n-egg conundrum if you delay hardware because of software. Everyone knows you have to have the hardware out well in advance for the software to be written to take advantage of it. Look how long it took AVX/SSE/FMA to actually be used beyond a handful of applications.

I could see them delaying to allow inventories to sell off, like they've done with GPUs and Llano in the past. And possibly delaying if Kaveri requires some new Windows 7/8 patch like Bulldozer did, but purely because of HSA I don't buy it.

Krnt · Aug 19, 2013

8350rocks :

I don't think NB OC will affect the time on W8, the article states that software overclock to the BLCK, wich I don't think many people will use, except for quick testing.

FD-SOI is the *only* way that SR will catch IB performance levels...because they could maintain the clock speeds. Look at Jaguar architecture on bulk...anything over 3.0 GHz is a massive headache to engineer, so they simply don't do it. I simply don't see bulk substrate being in the best interests of AMD, and I would be extremely shocked if kaveri launches on bulk. The only way they could even get *close* to the performance would be to go to a FinFET on bulk, and that would be *costlier* than planar FD-SOI for them.

Well maybe you are aright but for mobile and mid-range it doesn't seem like a bad idea if it can achieve K10 clocks and its IPC matches or even is a little better than K10, it can be good enough for the job.

I don't think we are not going to see any AMD good performer until late 2014 when its more possible that SR on FD-SOI arrives.

rmpumper · Aug 20, 2013

juanrga :

That's nice. One thing though:

8350rocks :

Even 8350rocks agrees that PII has better IPC than Piledriver - I guess he is Intel shill ranting on AMD as well.

de5_Roy · Aug 20, 2013

8350rocks :

oh-kaay... it's bad when amd favorers start posting depressing stuff. kaveri-on-bulk is beneficial for mobile, laptops - where the apus have been aiming from the very begining. i dunno much about bulk vs soi, never paid attention before. but going by the pros and cons discussed by the more knowledgeable ones here, i'd say amd is making the right move. only people miffed by this would be amd dt enthusiasts. who, btw, drive less than 5% of the market.
to repeat, there is no giving up, no dire need to catch up to intel high end, no high perf race.
you all seem to conveniently forget amd's success comes from cheap apus, not FX. making them on bulk helps amd (according to what you guys have discussed, i don't have personal opinion in this).

to experts: afaik, jaguar is designed to be process-portable. is it possible to fab kabini apus on SOI or FD-SOI? if yes, isn't it moot to lament over apus being made on bulk substrate? assuming future apus will be process-portable and anyone from common platform alliance can make them on any of their process and silicon.
who knows, may be this is amd's new mantra: 'since we couldn't beat intel, may as well join them in making apus on bulk silicon. we still got better igpus, woohoo!' 😛

Cazalan :

may be the reality is that glofo has screwed up yet again. kaveri moved from 2012 to q1 2014 in the name of 'getting it right'... could be anything from amd clearing unsold pd/trinity/richland to glofo's failure to glofo allocating and prioritizing fab resources for arm socs to amd not being able to prepare software support for hsa and kaveri in time for launch. or all of the above.

noob2222 · Aug 20, 2013

Something to think about, when all this talk of AMD abandoning SOI in 2012, GF wasn't planning on doing a 28nm SOI node.

Now they are, so who are they making the fab for? some unkown customer that may want to use some SOI silicon someday?

or a proven customer that knows how to use it already and quite possibly requested it, a customer that paid to get out of a "contract" when you told them you didn't have plans to make 28nm SOI. IE "If your not doing SOI, then what reason do we have to use your fabs?"

originally GF was going to re-do SOI at 20 and 14nm, skipping 28nm. So what happened to make them change their mind?

Here is some of what we actually know:

BSN article publiched in may 2012 stated no SOI.
~ June 2012, STMicro liscenced FD-SOI tech to GF.
jan 2013 ST-Ericsson successfully tested their "prototype" 28nm FD-SOI phone chip @ 2.5ghz.
Aug 2013, ST-Ericsson split up and go their seperate ways.

So ... GF has a 28nm FD-SOI tech node ... and ??? customer(s). This is what speculation is about, seeing what has changed since some "AMD IS ABANDONING SOI" article.

Who wants to bet that AMD would rather have FD-SOI over bulk if its available?

juanrga · Aug 20, 2013

8350rocks :

I think that 4-core Kaveri will be at i5 SB performance level. In some specific benchmark, it could surpass an i5 IB (#), and with HSA software it could surpass an i7 HW, but, in general, the CPU will be at i5 SB level.

# According to my simulations the 4C Kaveri would outperform an i5-3470 in the test John The Ripper.

Cazalan :

Disagree. Consoles are an example that show how developers can have the software ready before the hardware starts to ship to final consumers. In fact, AMD is copying this partnership model for the PC.

In my opinion, AMD is trying to correct the mistakes that did when released AMD64. This time, AMD has created a foundation for HSA with the biggest members in the industry (except Intel) joined, and are developing HSA software before the hardware is shipped to final consumers.

This is one example

http://www.pcper.com/news/General-Tech/AMD-HSA-LibreOffice-and-you

http://blog.documentfoundation.org/2013/07/03/amd-joins-the-document-foundation-advisory-board-to-accelerate-libreoffice/

“It is great to work on LibreOffice with The Document Foundation to expose the raw power of AMD GPUs and APUs, initially to spreadsheet users,” said Manju Hegde, corporate vice president, Heterogeneous Solutions at AMD. “Bringing the parallelism and performance of our technology to traditional, mainstream business software users will be a welcome innovation for heavy duty spreadsheet users, particularly when combined with the compute capabilities of the upcoming generation of AMD Heterogeneous System Architecture (HSA) based products.”

“It is exciting to work together with AMD and their ecosystem to take advantage of AMD’s cutting edge innovation right inside LibreOffice,” said Michael Meeks, SUSE Distinguished Engineer and TDF Board Member, “The growth in performance and parallelism available in the GPUs of today, and particularly with AMD’s revolutionary APUs of tomorrow, is something we’re eager to expose to LibreOffice users.”

I bolded the relevant parts that prove my point that AMD and developers are preparing HSA software before Kaveri is in customers hands.

juanrga · Aug 20, 2013

rmpumper :

No, in the part that you bolded he refers to hypothetical maximum IPC per core. I don't. My point must be similar to the part that you didn't bold, when he writes about the "reality".

Once again, if the Piledriver-based FX, at same clock and number of cores, gives the same performance than Phenom II, and if Piledriver has a 20% penalty due to the shared front-end, then the rest of the chip in the FX is offering about a 20% more performance to compensate.

In any case, Steamroller eliminates the 20% penalty with the improved front-end and this discussion is useless: there is no way that AMD will abandon the good modules design and will release new Phenoms; it doesn't make any sense.

etayorius · Aug 20, 2013

There is yet another rumor stating that AMD MAY have abandoned AM3+ and release a High End CPU in FM2+ socket, if this is true i will surely get and FM2+ board:

http://www.bitsandchips.it/9-hardware/3234-il-socket-am3-sembra-sia-giunto-al-capolinea

This is just another rumor, but GOD i hope it is true!

sarinaide · Aug 20, 2013

I thought we are in the know that AMD are not making an enthusiast level chip ala FX or Phenom again, put simply there is no market for it and if Kaveri is strong enough you don't need more. Looks like end of the line for AM3+ and about time, 4-5 year old tech expect 4-5 year old performance.

8350rocks · Aug 20, 2013

sarinaide :

Well, the platform is officially supported through 2015...so they must be doing something with it...

AMD CPU speculation... and expert conjecture

Distinguished

Distinguished

Honorable

Distinguished

Glorious

Honorable

Honorable

Distinguished

Honorable

Honorable

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Honorable

Splendid

Distinguished

Share this page