AMD CPU speculation... and expert conjecture

Page 732 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


50% over 290x would imply:

~10% faster than 295x2 @ 1080p
~20% slower than 295x2 @ 4K

How did you obtain the 50%?
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860
Ok, so everything on wikipedia is truth according to a poster.

 

IPC is instructions per clock right?  Meaning binary instructions right?  also means how fast a specific cpu runs correct?

 

One single instruction is not a program task.  Let me explain.  

 

Say you have 2 programmers that have the exact same school education.  Programmer A has 30 years experience, programmer B was just hired.  they are given the exact same task, write a program to draw this one picture, lets call it "frame".

 

Both programmers do thier job and it works flawlessly, however programmer A was more efficient and has 50% fewer lines of code.  both programs are ran on an intel 5960x and programmer A's program completes it faster than programmer B's program does.  does this mean that programmer A has a faster cpu than programmer B so that you  could go out and claim that the 5960x is faster than the 5960x?  or did program A have fewer "instructions" to complete the given task?  this is a concept known as "program optimizing".  it reduces the instructions not increase the IPC

 

This is the exact same concept as Intel's compiler. Give AMD cpus more instructions per task.

 

Lets call this "instructions per task" or IPT.  aren't all programs written in tasks, with the exception of assembly code?  which doesnt really matter because who programs a full game in assembly?

 

There are however programs specifically written to do away with any biasing.  these are commonly called "synthetic programs" or "synthetic benchmarks".  these are designed so that there is no biasing the results, if a cpu is capable of taking shortcuts via sse 4.2, it wont run sse 3 if AMD is detected.  these are the only programs that are capable of giving an accurate "IPC" for a given processor. So with this exception, wiki is correct, it takes a program to calculate IPC, however not every program is written to test IPC.

 

In summary, you can change the number of instructions in a program for a specific task, you cant physically change how fast a cpu handles "instructions" with writing a program. Sure, you can optimize said program, but isnt that changing the number of instructions still?
 
Well, noob, the idea is to use the exact same program (patch version included) to measure performance for the reasons you're writing. It's very dumb to measure performance of 2 different CPUs using different program versions. Now, I think your argument could go back to the compiler thing and code paths taken, but why go back there, right? haha.

Cheers!
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860
True yuka, but how many programs can you gurantee arent "genuine Intel" biased?

but the point im making is that it takes multiple instructions per task, and thats variable, not the hardware.
 
Well, yeah, it could be. Like any Turing machine, you go through the states depending on your input, so that could be translated that the same program will run different "code" each time. That will depend on the coding itself and the OS. It's really weird to say it like that, but since most benchmarking programs don't have variable inputs, it's hard to imagine they'd run different "code" even in the same CPU each time. Thinking about them being isolated and everything else 90% equal (kernel code has to be different, so...).

Cheers!
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
@noob2222, as explained to you, IPC depends on software. Not only the IPC depends of the application used, but the IPC varies within the same application. This is an example, except it measures the CPI (the inverse of IPC). The blue line represents the minimum CPI of the architecture (maximum IPC of the architecture), whereas the red graph represents the measured instantaneous CPI (instantaneous IPC)
bench1_100.g4.cpi.dat.png

The IPC that we associate to a given application is an in reality an average value from averaging the instantaneous IPC during a long run. In fact, we can complicate it more and say that the average IPC obtained from different runs of the same application will vary. This is why, if we want be serious, we need to run the same program/benchmark several times and obtain an average value from taking all the runs. That is why in more statistically serious reviews you see benchmark scores such as 12046 +-5 . Where the "+-5" is a measure of the statistical deviations from different runs of the same benchmark. I will not insist more in this kind of stuff, since you have a particular concept of IPC different to everyone else.

No sure what you mean by "the htt flag is intel's patented tech thats not part of the cross license agreement". What has to do hyper threading, which is a particular implementation of SMT, with the ISA?

Most windows programs use Microsoft compiler and most linux programs use GCC. Thus the number of what you call "genuine Intel biased" programs is small: 5%? 1%?
 


Given the same inputs, the same program should run the same each time.

And again, different instructions execute at different rates on different processors under differing circumstances.
 

jdwii

Splendid


Why i actually like the term Performance per clock
Also it doesn't matter if the program is bias or not if you will be using that program it just doesn't matter why buy a CPU that performs bad in one program just because its bias? Again why buy a CPU at all that offers inconsistent performance, heck i know server guys who turn off turbo mode over that.
 
Most windows programs use Microsoft compiler and most linux programs use GCC. Thus the number of what you call "genuine Intel biased" programs is small: 5%? 1%?

If that.

Which ironically hurts AMD, since Intel's compiler actually gives AMD the largest performance boost, at least the last time I saw a comprehensive performance review of GCC, ICC, and MSVC.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


I am arguing that the only way to judge which CPU you should buy is to run the programs you will be running in the environment you will be using.

You are confused. The MIPS number is generally derived from a formula to calculate a theoretical maximum. Just like when you calculate graphic card memory bandwidth. And you're trying to take this formula and apply it in ways that just don't make sense.

You are trying to extrapolate raw performance in a general sense by using a small sample size with a ton of variables. That is exactly my problem with your way of doing things. And we've already seen in this thread, you can cherry pick to show whatever hypothesis you set forth.

Go back in time and cherry pick some examples to show that Pentium 4 is better than anything AMD can offer. You can do it. I've tried it before.

IPC is a meaningless buzzword. What do you mean by it, anyways? If I have two CPUs, one has FMA, and the other doesn't, and I run a benchmark that's nothing but adding and multiplying, does the IPC even matter? You don't ever address any of these issues.

If, in my theoretical situation, you have a CPU A with twice as much IPC as the other one, CPU B. CPU A has no FMA and CPU B has FMA. So CPU A must run two instructions to do FMA equivalent and CPU B can just run FMA as one instruction.

Do you see why? We have extensive proof of compilers not playing fair with software. Intel lost a court case for fixing benchmarks in their favor. Nvidia and AMD both left SYSMARK because Intel was manipulating things.

You have absolutely zero right to make the assumption that a benchmark or program is treating two CPUs equally when we have 10+ years of benchmarks and programs doing exactly not that. And your entire argument revolve around the assumption that all CPUs are running the same instructions.

You are muddling marketing speak and science so much. Of course a start up is going to throw the term IPC around to pitch their own product. Being able to discern between marketing and science is a valuable skill and it improves people's argument quality drastically.

As far as deriving performance from formulas, that's exactly what Nvidia did for the 4GB of VRAM on GTX 970. Ask them how that turned out in the real world! The old frequency x (busWidth / 8) * 2 / 1000 formula worked awfully poorly for that last 512MB.

So, my questions regarding your arguments are simple. Prove to me that performance derived from mathematical formula is 100% accurate (you can start by proving the last 512MB of VRAM on GTX 970 operates at the calculated speed) and then prove that benchmarks run the same instructions for different CPUs.

I'm simply arguing the only way to judge a CPU's performance is to test your desired applications in your desired environment. I've made my case why that's superior to math and marketing. Now make your case why your way is better.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


This is where I give kudos to NVidia with their Denver cores. Having a way to dynamically optimize code.
https://www.youtube.com/watch?v=oEuXA0_9feM

Starting with the assumption that the code will be optimized for someone else.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


original.jpg


In this case it is 40% faster. Your 50% was close.

I guess 390X and Titan X will be neck to neck at 1080p but the Titan X will be faster at 4K. Maybe 30% faster?
 


From what I've read, the 390X will have a monster bandwidth, so it will more than easily make up for the less quantity of VRAM. 4K image pages should be in the 3.X GB of usage, but that's under GDDR5 measurements. Since they should have higher bandwidth, my bet is that the transfer rate will be enough to not need so much caching or preloading. And giving it a second thought, I might have it backwards... Uhm...

In any case, from the leaked TFLOPs alone, the 390X should be a tad faster than the Titan X, thanks to it's monster bandwidth.

Cheers!
 
^^ Assuming a memory-bound application, I'd expect the same. I expect to see the two off them swap victories, and some will be significant simply due to the cards being better at different things. Or basically, a repeat of the old ATI/NVIDIA wars (ATI favoring faster memory bandwidth, NVIDIA favoring pure shader performance).
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780



I reverse image searched that benchmark and it comes solely from an anandtech poster. And you are treating it as 100% legitimate fact. Please, don't do this.

Also, his IPC graph comes from a user at dragonflyBSD: http://leaf.dragonflybsd.org/~beket/geant4/dtrace.html

It includes such gems as modifying code and compiler to increase or decrease cache misses or hits and to increase IPC that way. Basically, that graph is about compiler optimizations and extracting more IPC from a CPU with compiler as opposed to measuring the hardware.

So, I don't really think this is about a chip's IPC, and it's more about extracting better IPC via software. which sort of fits into what the critics of IPC, CPI, etc have been saying in this thread. It's a useless metric that's mostly dependent on the software instead of the hardware.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


This recent news seems to confirm my point:

The decision to go for an 8GB Fiji rather than the planned 4GB version was in part attributed by Nvidia’s Titan X 12GB card announcement. This is just the first part of the story. One of the main reason is that the card is expected to perform so well in 4K gaming, that the 4GB frame buffer could impose a serious limitation.

http://www.fudzilla.com/news/graphics/37258-fiji-radeon-390x-comes-with-8gb
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Your link shows how "we can measure the cycles per instruction (CPI)" in section "2.1. Cycles per Instruction (CPI)" and uses as reference for that section a blog article that just agree with everything what I have been saying:

The cycles per instruction metric (sometimes measured as IPC – instructions per cycle) is a useful ratio and (depending on CPU type) fairly easy to measure. If the measured CPI ratio is low, more instructions can be dispatched in a given time, which usually means higher performance.

This is the maximum to be expected from the AMD64 architecture, which attempts to run three instructions per clock cycle.
 
Status
Not open for further replies.