AMD CPU speculation... and expert conjecture

Page 258 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

rmpumper

Distinguished
Apr 17, 2009
459
0
18,810


Yes. Also, 6350 "completely obliterates" 4350 in Crysis 3. How about we compare 4 core Piledriver to 4 core Phenom and 6 core Piledriver to 6 core Phenom?

4.7GHz 4350 is only a few % faster than 4GHz X4 and it beats 4.2GHz 4350. Piledriver does not look that good now, does it?

 
Hmm the article isn't entirely accurate. The Athlon II's are 45nm and only 512KB of L2 cache vs Llano's 32nm and 1MB of L2, it's that additional 512KB that's hurting it's performance vs something like the Phenom II x4 with it's shared L3. Honestly the Athlon II's really aren't worth the money, their just re-boxed binned Phenom II's. Better to get a FX-4 or 6 series for value based systems. I'm partial to the 6 and 8 but only because I never just do one thing on my system. Also has anyone done bench's on Skrim with mods loaded vs vanilla Skyrim? I've been playing it lately and not experiencing any of the issues people claimed would happen on AMD systems. FX-8350 @4.7 with 2 x 580 Hydro's and it's been smooth sailing. On the 2nd monitor I typically see four cores crunching away so there is a strong possibility that additional mods can change the performance footprint.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Double precision per BD module.



Did you read my posts? I already gave FLOP values for both SSE2 and FMA4 instructions. I also discussed both 32bit and 64bit.



Did you read my posts? I already gave FLOP values for both SSE2 and FMA4 instructions. I also discussed both 32bit and 64bit.



YES, for instance in the japanese article (July this year) cited by 8350rocks. Besides that, AMD already gave (at Computex this year) the GFLOP for kaveri and the value coincides with that resulting from analysing the Steamroller module that everyone knows.



No. AMD only releases a single Steamroller module. Kaveri APU is 'derived' from Berlin APU. There is no steamroller server/FX chip in the roadmaps.



No. In the first place, nothing in that manual mentions Steamroller. Some people speculates that entry is for future excavator module. I disagree. It seems related to this

bulldozer-fp-unit.jpg


In fact your hypothetical FPU with 256bit FMAC units gives GFLOP values that disagree with those given by AMD.

The correct values are:

- Steamroller module has 2 x 128bit FMAC units.

- Each unit can do 2 DP or 4 SP.

- SSE2 => 1 op; FMA4 => 2 op

- Steamroller Kaveri performance is 4C x (4SP x 2 FMA) x 4GHz = 4C x 8 FLOPs x 4GHz = 128 GFLOP

- The GPU performance is 512C x 2 FLOPs x 0.9GHz = 922 GFLOP (SP)

- Total APU performance: 922 + 128 = 1050 GLOP. This is the value claimed by AMD officially. Moreover AMD labs has confirmed that Steamroller gives "8 FLOPs per core".

Everything else is your own misunderstanding or pure fantasy.



And this proves that you don't read my posts. As quoted above by yourself I am saying that the those GFLOPs are for the CPU alone. I am also saying, in the same quote, what figures are double precision (DP) and what figures are single precision (SP).

Haswell is 3.5GHz, not 3.6Ghz; besides that mistake, the above GFLOP values are given officially by Intel. It is very easy to obtain them

haswell-3.png


Sandy Bridge double Nehalen FP capabilities by double wide units. Ivy Bridge maintain the same architecture. From the diagram:

- ( 8SP x 1MUL ) + ( 8SP x 1ADD ) = 16 FLOP

- i7-3770k performance is 4C x 16 FLOPs x 3.5GHz = 224 GFLOP (SP). For DP the value is one-half: 112 GFLOP (DP). This is the value claimed by Intel officially in their technical datasheets.

As observed in the above diagram Haswell introduces FMA support. Therefore:

- ( 16SP x 1FMA ) = 32 FLOP

- i7-4770k performance is 4C x 32 FLOPs x 3.5GHz = 448 GFLOP (SP). For DP the value is one-half: 224 GFLOP (DP). Again this is the value claimed by Intel officially.

Now the total performance (CPU + GPU):

- i7-3770k: 224 + 294 = 518 GFLOP. This is the value claimed by Intel officially.

- i7-4770k: 448 + 400 = 848 GFLOP. This is the value claimed by Intel officially.

- Kaveri A10 APU: 922 + 128 = 1050 GLOP. This is the value claimed by AMD officially.

Now please, stop ignoring what has been said. Stop ignoring what both Intel and AMD officially claim about its products, and stop fantasizing about imaginary Steamroller modules only in your head.
 

have you read this thread? fantasising and/or imagining about steamroller/excavator modules is what we do here. or.. at least that's what i do....
... did i read the thread title and OP right? is my dementia acting up again? susan? :pt1cable:
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Discussed here many times by several posters: 20% IPC over Piledriver (30% over Bulldozer). With HSA enabled software performance boost of up to a 500%.



AMD did a mistake with Bulldozer; everything else coming after (Piledriver, Trinity, Richland, jaguar...) was just as AMD claimed... but you are very cautious.

Just a question what are you about Intel new stuff? Supermegahypercautious? I say because Intel has a long history of disappointed products: Buggy Pentiums, Netburst, Larrabe, Atom, Haswell, Iris...



LOL



In the first place, Steamroller is Bulldozer made right, but by economic/legal reasons mentioned before AMD released Bulldozer.

In the second place, as discussed before, the modular design saves die space, reduces power consumption, reduces costs, and settles the basis for AMD big plan of moving the FPU to the GPU.

In the third place, AMD recognizes that the clustered design in BD/PD offers only a 80% of the performance of the traditional non-clustered design in Phenom. Therefore, the Crysis 3 benchmark (assuming the review was made right and there is not extra penalties for Piledriver coming from schedulers, buggy hotfixes, underclocked memory...) shows that the rest of the architecture has been optimized near a 20% IPC, compared to the Phenom one.

Note that a 20% IPC is about what Intel offers from Sandy Bridge to Haswell.

Suddenly your anti-AMD rant does not look that good now...
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


This thread is about "speculation... and expert conjecture". One is allowed to fill in the holes of what is not known about steamroller, one is also allowed to make honest mistakes, but not to spread misinformation in systematic ways, specially when AMD says the contrary.

By spreading false ideas about what Steamroller will be (false 45% IPC, false 256-bit FMAC units, false four threads per module,...) people will be disappointed on launch.
 

rmpumper

Distinguished
Apr 17, 2009
459
0
18,810


1. No one is even talking about Phenom I.
2. No one cares about Phenom I.
3. Phenom I is not in the article.
4. Piledriver IPC is lower than Phenom II.



Saying that Phenom II is better than Piledriver is anti AMD rant? That nice. Go home, troll, you're drunk.

 

i am no expert, but i do speculate and er.. conject.. the problem is that amd adds a dislaimer to every promotion and makes sure they are able to disavow any leaks. so... even when amd says, they can change what they said very easily. it helps keep them safe from any misinformation or false hype. 'blame them fanboys' and whatnot.

before, i questioned everything that seemed like those. later i realized that amd makes sure that they never have to take responsibility for anything like that. that makes arguing info like these sorta moot. in the end, if people believe false info, it's their fault - this is what amd will say to you and us.
although, i am pretty sure the 4T/module was pure speculation. if someone has said otherwise i.e. his claim is real, then it should be verified upon launch. do you think he's gonna get away with spreading falsehood for all this time? if you check, you'll notice that quite a lot of bd-hypers are absent now or are using different id or hang out in different forums.
 

8350rocks

Distinguished


Because of the modular design of BD/PD, they are setup entirely differently.

The STARS architecture is not really on the same playing field. Though you might have a higher maximum IPC per core on STARS (1 FPU per 2 cores with PD and 1 FPU per core with STARS, plus 3 ALUs per core STARS and 2 per core PD)...the reality is the process tweaks on PD allow that architecture to actually accomplish more in the same time frame.

It's like comparing Intel and AMD architecture right now...it's not really apples and oranges because of the drastic difference in approach.

The one thing that is a TREMENDOUS advantage of the modular design, is that once they get down the ability to feed the cores streamlined, it should work very fast because of the process tweaks.

The other issue as well, is die space. The maximum die size on STARS architecture was quite a bit larger than the maximum die space on the 8350. I cannot recall specifically what the die size was on the 1100T, though the 8350 is 315mm^2
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860


the 4300 is only half a cpu. 2 modules, when running 4 threads runs at 80% efficiency. PII runs at 86% multicore efficiency. I have never cared for these 4xxx chips, but they do work, sometimes better than I expect.

F1-2012-High-Frame-Rate.png


Sure, the Phenom II barely beat out the 4350 in C3.

4. Piledriver IPC is lower than Phenom II.

in a few select cases if your comparing 4 "cores" to 2 "modules", but because people believe this theory, the PII x6 if you can find one are 250+, putting it considerably more expensive than the 8350.
 

rmpumper

Distinguished
Apr 17, 2009
459
0
18,810


I bet I would not get anything higher than ~$70 for my used 1055T :) And a brand new 1045T is ~$130 in Lithuania.
 

8350rocks

Distinguished
In the K10/PD comparison, after they factored in the Intel offerings, I found this to be a particularly well summed conclusion:

The shining star in today’s comparison is AMD’s FX-6350, which delivers solid performance in games, while besting Intel's Core i5 in a number of our other benchmark workloads. The cheaper FX-6300 is an even more attractive bargain, so long as you're willing to overclock it.

I have been saying that the 6300/6350 offer i5-ish performance for a long time, and so many intel fan boys claim that it's just not there...but even Tom's Hardware admitted that the 6300/6350 are ridiculously good performance, and at $60-80 cheaper than the 3570k it's extremely hard to pass up the value there.
 

GOM3RPLY3R

Honorable
Mar 16, 2013
658
0
11,010


Even though you may think it is, there is no direct link (other than your own opinion) saying that it's SOI. Every other side says it's Bulk. I think it'll be bulk...

However, you never know. They could make a 2nd generation of Kaveri with SOI, or the 3rd gen.

I'm not saying you are wrong completely, however, there is no definite source other than opinion saying that it'll be SOI. Even though I hope it is, it just isn't. :(
 

8350rocks

Distinguished
Here is the issue with believing it's on bulk:

1.) Kaveri ES's have been on record at Cosmic Labs since February this year.

2.) Bulk process will reduce performance, heat, voltage properties significantly.

3.) If it isn't on SOI, why are they timing it *perfectly* to coincide with when the 28nm FD-SOI production at GF is ramping up?

It doesn't make sense at all...if they were going bulk, we could have Kaveri already in our hands...

My guesstimate here is that, after they tested Kaveri ES's on bulk process, they realized how degraded the performance would be, and how bad it would underperform. So they didn't change their official announcement *just in case* but they are likely planning kaveri to be on FD-SOI going forward because the performance drop off on bulk would be far more drastic than they initially expected.

EDIT: The Opteron roadmap earlier this year did not state that Kaveri was coming Q1 2014, it said 2H 2013 specifically...this, in my mind, means that while there has been a shift of paradigm, the delay is slight, so AMD maintains the perception of "we meant a paper launch all along".
 

GOM3RPLY3R

Honorable
Mar 16, 2013
658
0
11,010


Not bad, I never even though about the 6350 for gaming or anything. :p

Here: http://www.cpu-world.com/Compare/442/AMD_FX-Series_FX-6350_vs_Intel_Core_i5_i5-3570K.html

Each processor has it's drawbacks, but wow, the 6350 isn't bad. The only thing though is it requires (stock) 125W rather than 70W. Yeah I know, "power doesn't matter on a desktop," but the thing to look for is how much power it needs to do a certain amount of work. It may not matter much, but historically (light bulbs is a good example), the more power (electricity) the hotter it gets. The more power you put on something, usually (for really anything), it degrades and destructs much more. So in terms of wear and tear, the i5 will have it's benefits of longer lasting.

Otherwise, the 6350 is totally an insane bargain. Get that and a 760 for $250, you'd be gaming at about high settings @ 1080p with great frames (remeber, no AA, lol).

EDIT:

Also here on CPU Boss, look at this!:
http://cpuboss.com/cpus/Intel-Core-i5-3570K-vs-AMD-FX-6350

The i5 won, especially in single threaded (lol), but it also beat it in overclocking by .4 points.
In the end though, there is much more props for the 6350. :mdr:
 


You must have been missing the last few months of the mid range market and a magical tool called the FX6300 (AKA i3-3225 just got :fuck: up) :whistle:. The GTX 760 is superb with 2x MSAA and SweetFX on BF3 and Skyrim, at least from my results. Heat actually tends to favor AMD, which has much lower maximum temps, since there is not everything jammed on the CPU. Plenty of Pentium Ds and Athlon 64s from a long time ago that can still overclock fine. It is obvious that the 6300\6350 is a better value than the 3570K, not to say it is not worth it.

@rpumper: Look, the FX4350 is far cheaper than the Phenom II X4 965BE when it was first released and the whole modular Bulldozer uArch practically relies on moar coars for good performance.

Realistic comparison for the 6350 :p: http://cpuboss.com/cpus/Intel-Core-i3-3225-vs-AMD-FX-6350
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


Which is wrong, 8 flops FPU can only be achieved with 128bit FMA4, and those are single precision not double.The main problem is "issue port", tough those are SIMD instruction to claim a sustainable peak flop rate you got to count the "wide" of the issue port



I'm reading and answering now... what do you think ? And i don't care where you picked those values to parrot them, just give the math leading to them, don't pretend to have the authority just because you picked something from somewhere which you think is authoritative. There could be mistakes and there could be very wrong interpretations, which would make you automatically wrong not right, even if those are not your mistakes.



Here we go again, sounds exactly like a parrot



AMD gave "supposedly" the FLOP number for Kaveri, counting the GPU, which will be "single precision". You went along and extrapolated the frequencies and and DP based on your "no math" incredibly likely very wrong assumptions.



Nobody really knows anything concrete about Berlin yet. AMD gave an interview were it stated the intention of continuing with FX... the rest are assumptions that could be wrong (even mine).

At least i have the care to expose that 2 FlexFPU could be technically possible, not that they will have it.

With you, is how you say and thats the end of it LOL (presumptions are the mother of all F ups lol)



What an arrogant little... quite an imagination lol... but you are right (*some*) in the simple math, only mention that 4 SP per 128bit pipe only with 128bit FMA4 instructions, no other instructions could sustain that rate... and there isn't 4C APU version according to reveal, was advented the possibility of 3C, but was discarded by now.

So Kaveri will be only 2 modules 2 FlexFPU and the rate is half of what you claim on the CPU side... unless there is 2 FlexFPU per module LOL ... otherwise you have to re-ẽvaluate your fantasy.

If the 4C means "4 cores" like in Integer Cluster/Cores, then you are more F up than i imagine, those Integer Cluster/Cores don't do any FP calculations, and that is exactly the strong point of the design.

Those vectors/FP instructions are SIMD in nature, the same instruction can run several times with different data. 1 single core module could be enough to fill a 2 FMAC FPU (which again leads to the possibility of 2 FlexFPUs), but obviously it could tend to leave much performance on the floor, it would have to be implemented wisely. Mixing Integer cores for the FP calculations is completely F up, GPUs don't have Integer cores per say, yet is not because of that you can't calculate FLOP rates.

And no * - SSE2 => 1 op; FMA4 => 2 op *;.. SSE as well FMA4 are vector instructions, SSE have 2 or 3 operands and FMA4 have 4 operands, and of those only very few correspond to actual 1 Add+ 1Mul or 2 adds + 2 Muls in case of FMA4, usually 1 or more operands are destination registers (mem ops).

And no the * IMAGE* doesn't present anything new, that is exactly how the FLexFPU deals with 256bit instructions now, it uses both FMAC pipes of the FPU.. the scheduler is/was unified since BDver1.

For FP256 as revealed the issue port must be 256bit wide, and the addressing is 256bit wide in nature (not 128bit in half's) [ EDIT : and AMD could do this because they have a "load buffer" before the FP pipes, the load buffer could be 256bit wide, while the rest of the data paths remain 128bit, for not penalize clock ability]. For 128bit instructions on 256 bit pipes, those could be be packed before and interacting with the scheduler *at runtime*, like used in the schemes of "Uops fusion" either Intel or the AMD K10 ALU+AGU, or *at compile time* and so the code is transformed into AVX 256 or FMA4 256, and so no more 128bit vectors. If those are not packed or compiled for 256bit, a 128bit instruction can only be issue one at a time per pipe, wasting half the possibility of a 256bit pipe.

The same happens today for 64bit or 32bit FP ops on 128bit pipes, and since there isn't any *runtime* packing so far, that is why 8 FP ops per FlexFPU only if the code is compiled for 128bit FMA4 instructions, which is single precision.



What a confusion lol ... And because is (could be) 1050 GFLOPS, it doesn't mean is divided like you say... it could be only GPU (more likely), and single precision... how can one not ignore when you don't really know nothing concrete !? lol

And RWT is wrong, they went along to be part of the propaganda machine.

In that * IMAGE *, Port 0, Port 1 and Port 5 are 128bit wide... just tell me how you fit 8 SP (8x 32bit ops) trough a 128bit port !!?? ... they are counting only with the SIMD nature of those instructions, once issued they can run several times. But even this is quite "borged", because even RWT with a little effort (LOL), mentions that to load 256bit data trough 128bit data paths, employing a single L/S engine for the purpose, sometimes some instructions, can take up to 5 cycles. big LOL...

No.. in spite all the flash and the potentiality of those exec clusters, the reality is that 256bit rate in intel is ~ the same of BD that uses halfs, and 2 L/S engines. Intel is dropping more potential performance on the floor than AMD, because they have 3 exec pipes (3 issue ports) for FP calculation while AMD only has 2.

That is reality, and a point in favor of AMD design of separated FPUs and modules, the rest is propaganda. Intel design on the side of CPU is not really prone to "simple math" for FLOP calculations, no matter the forgetting of issues and pulling *theoretical* peak flop rates out of the arse. And worst they don't have FMA4 which makes those *theoretical* numbers very hard to swallow, specially concerning single precision.

Worst the GPU side is neither simple to extrapolate.

The 448 GFLOP bigLOL... if GPU corresponds to 20 EU, 80MADs

From
http://translate.googleusercontent.com/translate_c?depth=1&hl=pt-BR&rurl=translate.google.com&sandbox=0&sl=ja&tl=en&u=http://pc.watch.impress.co.jp/docs/column/kaigai/20130602_601851.html&usg=ALkJrhhPzMDgw2L2K6-JhljRXPUDnNtkBA

Chart images of different GPUs
http://translate.googleusercontent.com/translate_c?depth=1&hl=pt-BR&rurl=translate.google.com&sandbox=0&sl=ja&tl=en&u=http://pc.watch.impress.co.jp/img/pcw/docs/601/851/html/20.jpg.html&usg=ALkJrhiciI2SqQZM48SAolYQbJi2E0FtbQ

Now if 80 MADs if corresponding to 160 ops at 1.2Ghz, means 160x1.2 = 192 GFLOPS , meaning the CPU cores would have to have 256 GFLOPS LOL or 64 GFLOPS per core, double of AMD and that is not what benchmarks tells...

If 448 GFLOPS is Iris... then is double (160 MADs) or 384 GFLOPS for the GPU alone and 64 GFLOPS for the CPU (more like reality)... but then that is not 4770k is it ? ... too much confusion in all of intel, i've given up long ago, you can entertain yourself with this complete futile worthless pointless academic exercises.

 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


ehh! quite wrong assumption. Actually i'm quite convinced is exactly the opposite, AMD chips tend to last longer, and withstand higher powers because they are SOI.

So that simplistic "assumption" is right in a simplistic way, but when you start to consider details, like materials... high strength steal is not the same of forged iron, though all is based on iron... then all simple assumptions can be wrong. The quantities and qualities of materials to make a SOI chip is what makes it withstand higher power and last longer. Intel is and was about high volumes, they tend to fab cheaper an sell expensive that is why profits are so high.

To validate my POV, IBM chips are SOI and are rated at ~250W... and those go to super expensive machines, that should work flawless at full load max clocks, which with z196 next goes for 5.5Ghz for a 600mm² chip, 24/7 for many years... there isn't turbo, this is not OC exercise... and IBM would be bankrupt if a chip in those machines fails every year...

No... i think AMD gets more "material" quality... so far...

 


My history of processors, if you going to have a failure its likely to be an Intel, in fact I have gone through every CPU basically and never had a AMD fail under day to day or extreme conditions. Intel degenerate fast and also suffer cold bug syndrome.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


LOL. I removed the bold from your quote.

Now pay attention: one != I

Now I will rewrite the above phrase:

shows that the rest of the architecture has been optimized near a 20% IPC, compared to the Phenom architecture.

Driven by curiosity I used just now an automatic translator and it was able to translate the original phrase with "one", giving me the exact meaning that I tried.

Yes, the first time that you said that Phenom II is better than Piledriver was an anti AMD rant. You repeating it, after being corrected and educated shows that you are here for trolling against AMD.



What? The i5 cost nearly the double of dollars and only offers a 33% more in single threaded performance?

Moreover, who (beyond intel fanboys) purchase a quad for running single threaded software? The extra two cores of the FX-6350 compensate very much, offering nearly the same performance but at lower cost.
 

Krnt

Distinguished
Dec 31, 2009
173
0
18,760

+1 to that.
SOI quality is always superior to bulk, also AMD CPUs have always been able to withstand more voltages and heavy overclocking for longer periods, also I've never seen an AMD fail on me, whilst Iv'e seen a lot of Intel CPUs die under normal/office conditions.
 
Status
Not open for further replies.