AMD CPU speculation... and expert conjecture

Page 530 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

vmN

Honorable
Oct 27, 2013
1,666
0
12,160
Is there any prove that CMT is handling the ALU better than SMT? Because that is the insight you are providing by this statement: "2 cores > 1 core + HT"

Piledriver CMT core have the same amount of ALUs as haswell core. The difference could be because of the clockspeed differences.

There is alot more to factor.

UPDATE: Saw you compared it to IB, then it could be because piledriver had more ALU pipelines.
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860
Perf = ipc * clock * cores * scaling %

Phenom scaled around 87%. Bd arch scales around 75% overall, but 90+% when loading cores and not cmt threads. Intel cpus scale around 92% and 20% on HTT threads.

multithread division by your math keeps those percentages included in the IPC.

And trying to claim jaguar = puma is like saying bd = pd = sr. Its an improved core period, end of discussion.
 


% Increase != % Difference
 


Don't forget core loading!

But yeah, you quickly get something that looks like this (we'll use a 2600k to drive a point home):

Perf = [Physical_Cores*(IPC*Clock)] + [Logical_cores*(IPC*Clock)]

Since you have to factor in HTT. And if you want to factor in core loading, then you have to do the math on a per-core basis. Gets REALLY messy really quickly.
 
Sony says PS4, unlike their previous consoles, is "already contributing profit"
http://www.gamespot.com/articles/sony-says-ps4-unlike-their-previous-consoles-is-already-contributing-profit/1100-6419822/
so soon... parts and deals musta been cheap.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Yeah, total 16 ALUs for the FX-8350 and 12 ALUs for the i7-3770k. The ratio is 1.33. Here one integer benchmark where the FX-8350 is about 29% faster

http://openbenchmarking.org/embed.php?i=1210227-RA-AMDFX835085&sha=293f200&p=2

Of course, the FX loses in floating point because each module has one shared FPU

http://openbenchmarking.org/embed.php?i=1210227-RA-AMDFX835085&sha=f7bfb85&p=2

The lost of performance from ~30% faster in integer (thanks to the 2 integer cores per module: "2 cores > 1 core + HT") to ~30% lower in floating (due to the shared FPU of the CMT architecture: "1 PD FPU < 1 IB FPU") is close to the ratio between integer cores and FPUs of the CMT architecture: 2.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Your "scaling %" is unneeded, because it is measured by the IPC (the formula given by gamerk is even more regrettable).

This is why we say that Steamroller brings ~30% IPC gains over Bulldozer. ~10% is from core improvements and remaining ~20% is from improvements at module level, more concretely from eliminating the scaling penalty due to shared front-end/decoder.

AMD_steamroller_feed_cores.jpg


Finally, I said that puma+ is based in the same micro architecture than jaguar and thus both have the same IPC. This was shown using your own CB benchmark numbers from AT. The rest: "jaguar = puma", "bd = pd = sr" is in your imagination only.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Pay attention to the bold-underlined parts. What you said is incorrect.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The problem with your formula is that it is assuming that logical and physical cores are independent cores and always add to the performance, when both share resources. Your formula also treat real and virtual cores in symmetric fashion. I prefer to write "1 core + HT" as I did before. In some case enabling HT result in a lost of performance due to overload of the shared resources. This case only could be explained by your formula using negative IPC for the virtual cores, which makes little sense.

Once again, the expression is

Perf = Physical_Cores * IPC * Clock

where IPC (Instruction Per Cycle) accounts for any muarch improvement, including SMT. The improved IPC is why an quad core i7 can be faster than i5 despite both having quad cores.

A more general equation used in HPC is

Performance = (CPU speed in GHz) x (number of CPU cores) x (CPU instruction per cycle) x (number of CPUs per node)

http://www.novatte.com/our-blog/197-how-to-calculate-peak-theoretical-performance-of-a-cpu-based-hpc-system

For desktops, the "number of CPUs per node" = 1 (unless some of you have a dual socket mobo or a cluster in the home :sarcastic:) and the general HPC formula reduces to that I gave above:

Performance = GHz x cores x instruction per cycle.
 


You are essentially trying to ignore SMT concerns. While you can get away with this for HTT, other forms of SMT, such as AMD's CMT, start causing issues, due to the overhead involved in using the second core of a BD module. Hence why I tried to keep the comparisons between just quads, because you start making the formula a LOT more complicated.

And if you REALLY want to be technical, all this math assumes the processor is doing 100% work on the program in question, and if any CPU core isn't stuck at 100% load, you grossly overestimate IPC, so you need to account for core loading on a Per-Core basis, then figure out how much each core affected performance, hence why Physical and Logicial cores have to be considered separately, factoring in the "average" performance benefit/loss of various core loading profiles into the formula, which goes well outside what we are trying to discuss at this point.

Hence, the IPC numbers given by my formulas are grossly inflated. But for comparing two processors where the app in question is the only major program running, the numbers are good enough to be able to calculate the relative difference in performance between two chips, since the numbers would be inflated equally for both.

Also, Juan, learn some math:

http://mathforum.org/library/drmath/view/58083.html
http://www.mathsisfun.com/percentage-difference.html

You are trying to use % Change, which is not valid when comparing two significantly different items. You have to use % difference instead, which is defined as:

| (V1 - V2) / (V1 + V2) / 2 | * 100%

Hence my numbers.
 
AMD and Microsoft to hold DirectX 12 conference
http://www.fudzilla.com/home/item/34845-amd-and-microsoft-to-hold-directx-12-conference
FinalWire Introduces AIDA64 v4.50
http://www.techpowerup.com/201275/finalwire-introduces-aida64-v4-50.html

Intel inks partnership with Rockchip
http://www.fudzilla.com/home/item/34843-intel-inks-partnership-with-rockchip
this will end up as intel competing with amd's consumer arm and x86 socs by proxy and directly. i just can't ..... shrug off the feeling that rockchip will get screwed somewhere down the line....
edit: mediatek wasn't a candidate because of their close ties with amd and hsa foundation. hmm.....
 

i think that neither will ever confirm which came first. it'll always be up to speculation and it is in amd's best interest that it stays that way. chances are, amd, sony, ms all started from the same place - a gfx api for the gcn uarch in the console socs. afterwards, amd could have put together mantle from that. this is also speculation, since the software and hardware developments for each console is supposed to be confidential and hidden from the competitor.
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860
@ juan

The only imagination around here is your hatred for AMD and anything x86. Your the one posting misinformation that ARM > anything AMD makes. The only one here who cant see past your lies is you. You get away with insulting people without warning but when someone calls you out, you continue to belittle them with your huge ego.

ARM is designed for low power. Just because it wins one specially designed benchmark doesn't mean ARM > all.

Learn to read past all the articles posting AMD hate such as your constant linking of extremetech who didnt even disclose what benchmarks were used or how they monitored actual clock speed. If you had your own computer to run benchmarks you would realize that turbo core is not flawless. Do some investigation of your own.

IPC is not a variable number that changes according to how many cores are being used. Figure that out before you try to insult me again.
 

jdwii

Splendid


Lets not also forget about that Intel IGPU that will knock the 295x off the charts in terms of performance should only be 2 years now.
 

UnrelatedTopic

Honorable
Nov 4, 2013
22
0
10,510


To be fair he did post that link. Not trying to suck his wang but you guys need to stop going after the man. It's getting more annoying that the man himself.
 


Software can be quite easily recompiled to run on different uArch when needed. There is a large section of typically x86 code already moved over (primarily relating to servers / linux). Windows RT is also a shared code base with windows 8.X for x86, the problem is legacy software support. That however is going to go away over time imo- how many of you use software from the 90's any more? There are still some large organisations utilising custom software that is pretty old but I think eventually it will get updated at which point the underlying platform becomes irrelevant.

I don't expect ARM to replace x86, however I think the 2 are going to co-exist in the same spaces allot more moving forward and I think AMD have been pretty shrewd lately. In the right circumstances an ARM design might be a better option than x86 (obviously not for everything). The market might be fairly small (small enough for Intel to ignore) yet be profitable enough to really help a smaller (comparatively at least) outfit like AMD.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


First, SMT != CMT.

Second, the above formula for performance also works for SMT. If you check the above HPC link you can found the application of the formula to Intel Xeon processors. Of course, one uses physical cores, not virtual cores.

Third, yes the processors have to be loaded in order to measure performance. If the processors are at iddle, then we aren't measuring their potential. I believed this class of stuff was self-evident. I try to avoid self-evident stuff in my posts. Would I mention that the computer has to be turned on?

Fourth, you continue making the same mistake about percentages.

If Phenom II scores 13.51 and Kaveri scores 16.77 then Kaveri is 24% faster than Phenom II. It is not 21% faster as you pretend. I already gave you the formula. I repeat it:

(( 16.77 - 13.51 ) / 13.51 ) * 100 = 24.13

You can find this formula in the same site that you mention

http://www.mathsisfun.com/numbers/percentage-change.html
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


You already tried an unfair post before. I ignored it, whereas some people replied you. Now that you insist on being unfair, you deserve a reply.

If he had simply replied the post with the link, I would remain silent. However, he couldn't resist the temptation of writing an ironic "Oooh! Look everyone! AMD news to discuss..." followed with a more than evident allusion to certain people here. Thus I replied him, also ironically, reminding that his supposedly new link had been given before in the thread, and that he had ignored the former post and the news.

The reply was made to ensure that next time he tries to do insinuations about others he checks twice before posting.

What is really annoying is that mentioning that one link was given before is causing you big trouble, but all the army of guys who are here posting misinformation, lies, and personal attacks daily is not generating you any trouble. In response to your first post several people gave you several names of posters. Amazing how your hypersensibility only goes in one direction.
 
Status
Not open for further replies.