juanrga :
blackkstar :
Also, his IPC graph comes from a user at dragonflyBSD:
http://leaf.dragonflybsd.org/~beket/geant4/dtrace.html
It includes such gems as modifying code and compiler to increase or decrease cache misses or hits and to increase IPC that way. Basically, that graph is about compiler optimizations and extracting more IPC from a CPU with compiler as opposed to measuring the hardware.
So, I don't really think this is about a chip's IPC, and it's more about extracting better IPC via software. which sort of fits into what the critics of IPC, CPI, etc have been saying in this thread. It's a useless metric that's mostly dependent on the software instead of the hardware.
Your link shows how "
we can measure the cycles per instruction (CPI)" in section "
2.1. Cycles per Instruction (CPI)" and uses as reference for that section a blog article that just agree with everything what I have been saying:
The cycles per instruction metric (sometimes measured as IPC – instructions per cycle) is a useful ratio and (depending on CPU type) fairly easy to measure. If the measured CPI ratio is low, more instructions can be dispatched in a given time, which usually means higher performance.
This is the maximum to be expected from the AMD64 architecture, which attempts to run three instructions per clock cycle.
Yeah, and the rest of the article is talking about making compiler adjustments in an OS that has significantly less market share than Linux. The three instructions per clock tick is the optimal goal that they are trying to achieve via compiler changes. It's theoretical.
The whole thing is basically "our version of amd64 is theoretically capable of 3 instructions per clock tick, how do we get our compiler to actually achieve that"
I suggest everyone else actually read the content I posted where Juan cites his thoughts on his graph and come to your own conclusions.
IPC, CPI, etc are all measurements that have a theoretical limit of the architecture and then their real world values depend on the compiler.
To make matters worse, the software they're using to measure is not a benchmark at all, it's a scientific simulation software used as a benchmark.
It's like I said previously yet you refuse to address. That graph is measuring how close the compiler can get to theoretically perfect IPC of an architecture. IPC is far more dependent on the compiler than the hardware.
Example is if a compiler poorly optimizes for caches of a certain CPU architecture and it constantly has cache misses, the IPC will be bad. It's not the CPU's fault, it's the compiler's.
I'm trying to help you understand the data you presented and you seem to not want to have anything to do with it. I hope that the others appreciate it.
IPC and all related measurements are simply a benchmark of the compiler and how well the compiler uses an architecture.
We have a real world example of this, compare CLANG/LLVM to GCC benchmarks (Phoronix does this quite often)
http://www.phoronix.com/scan.php?page=article&item=gcc49_compiler_llvm35&num=2
Same software, same hardware, different performance. Thus, by your argument, different IPC. The only variables changing here are the compiler. Look at GraphicsMagick, GCC 4.8.2 is almost twice as fast as CLANG in the same OS, same hardware.
It doesn't matter the software, hardware, or OS. All that matters is the compiler's quality and how well it can optimize for a certain architecture. CLANG is clearly not coming close to theoretical IPC limit of the 4770k in GM, and that benchmark has nothing to do with 4770k's instructions per clock, CPI, etc.
This is exactly the thing your IPC benchmarks are showing, yet you insist it has nothing to do with compiler at all and is all about the hardware.
IPC is strictly a compiler issue. If you want to talk IPC of a specific architecture, go create a perfect compiler that will extract the absolute best performance out of every relevant x86 CPU. I'll be waiting...
Also, you can safely ignore the time compilation benchmarks, they only benchmark how long it takes the compiler to generate code and are not relevant to this discussion.