How will the 390X perform compared to 295X2? Guesses/leaks?
Only leaks were the fake 3d mark and the most likely fake chiphell leaks, I would optimistically say on rumored spec it will be a little over 50% faster than 290x. Could be a good battle with titan x if so.
50% over 290x would imply:
~10% faster than 295x2 @ 1080p
~20% slower than 295x2 @ 4K
Ok, so everything on wikipedia is truth according to a poster.
IPC is instructions per clock right? Meaning binary instructions right? also means how fast a specific cpu runs correct?
One single instruction is not a program task. Let me explain.
Say you have 2 programmers that have the exact same school education. Programmer A has 30 years experience, programmer B was just hired. they are given the exact same task, write a program to draw this one picture, lets call it "frame".
Both programmers do thier job and it works flawlessly, however programmer A was more efficient and has 50% fewer lines of code. both programs are ran on an intel 5960x and programmer A's program completes it faster than programmer B's program does. does this mean that programmer A has a faster cpu than programmer B so that you could go out and claim that the 5960x is faster than the 5960x? or did program A have fewer "instructions" to complete the given task? this is a concept known as "program optimizing". it reduces the instructions not increase the IPC
This is the exact same concept as Intel's compiler. Give AMD cpus more instructions per task.
Lets call this "instructions per task" or IPT. aren't all programs written in tasks, with the exception of assembly code? which doesnt really matter because who programs a full game in assembly?
There are however programs specifically written to do away with any biasing. these are commonly called "synthetic programs" or "synthetic benchmarks". these are designed so that there is no biasing the results, if a cpu is capable of taking shortcuts via sse 4.2, it wont run sse 3 if AMD is detected. these are the only programs that are capable of giving an accurate "IPC" for a given processor. So with this exception, wiki is correct, it takes a program to calculate IPC, however not every program is written to test IPC.
In summary, you can change the number of instructions in a program for a specific task, you cant physically change how fast a cpu handles "instructions" with writing a program. Sure, you can optimize said program, but isnt that changing the number of instructions still?
Well, noob, the idea is to use the exact same program (patch version included) to measure performance for the reasons you're writing. It's very dumb to measure performance of 2 different CPUs using different program versions. Now, I think your argument could go back to the compiler thing and code paths taken, but why go back there, right? haha.
Well, yeah, it could be. Like any Turing machine, you go through the states depending on your input, so that could be translated that the same program will run different "code" each time. That will depend on the coding itself and the OS. It's really weird to say it like that, but since most benchmarking programs don't have variable inputs, it's hard to imagine they'd run different "code" even in the same CPU each time. Thinking about them being isolated and everything else 90% equal (kernel code has to be different, so...).
@noob2222, as explained to you, IPC depends on software. Not only the IPC depends of the application used, but the IPC varies within the same application. This is an example, except it measures the CPI (the inverse of IPC). The blue line represents the minimum CPI of the architecture (maximum IPC of the architecture), whereas the red graph represents the measured instantaneous CPI (instantaneous IPC)
The IPC that we associate to a given application is an in reality an average value from averaging the instantaneous IPC during a long run. In fact, we can complicate it more and say that the average IPC obtained from different runs of the same application will vary. This is why, if we want be serious, we need to run the same program/benchmark several times and obtain an average value from taking all the runs. That is why in more statistically serious reviews you see benchmark scores such as 12046 +-5 . Where the "+-5" is a measure of the statistical deviations from different runs of the same benchmark. I will not insist more in this kind of stuff, since you have a particular concept of IPC different to everyone else.
No sure what you mean by "the htt flag is intel's patented tech thats not part of the cross license agreement". What has to do hyper threading, which is a particular implementation of SMT, with the ISA?
Most windows programs use Microsoft compiler and most linux programs use GCC. Thus the number of what you call "genuine Intel biased" programs is small: 5%? 1%?
Well, yeah, it could be. Like any Turing machine, you go through the states depending on your input, so that could be translated that the same program will run different "code" each time. That will depend on the coding itself and the OS. It's really weird to say it like that, but since most benchmarking programs don't have variable inputs, it's hard to imagine they'd run different "code" even in the same CPU each time. Thinking about them being isolated and everything else 90% equal (kernel code has to be different, so...).
Cheers!
Given the same inputs, the same program should run the same each time.
And again, different instructions execute at different rates on different processors under differing circumstances.
True yuka, but how many programs can you gurantee arent "genuine Intel" biased?
but the point im making is that it takes multiple instructions per task, and thats variable, not the hardware.
Why i actually like the term Performance per clock
Also it doesn't matter if the program is bias or not if you will be using that program it just doesn't matter why buy a CPU that performs bad in one program just because its bias? Again why buy a CPU at all that offers inconsistent performance, heck i know server guys who turn off turbo mode over that.
That part doesn't make a lot of sense. Anyways 110-130 watts that's pretty nice that's were the 265 is i think and the 960 isn't to far ahead using around 145 watts.
Most windows programs use Microsoft compiler and most linux programs use GCC. Thus the number of what you call "genuine Intel biased" programs is small: 5%? 1%?
Which ironically hurts AMD, since Intel's compiler actually gives AMD the largest performance boost, at least the last time I saw a comprehensive performance review of GCC, ICC, and MSVC.
I said several pages ago that IPC != program specific performance, yet was told i was simply "wrong". Here we get an example how ipc can magically change simply by shutting off 2 cores.
So that means with a fx 4320 vs 5960x, in the above example, the 4320 will have higher "IPC" than haswell e ... really ...
So again, who thinks IPC = program specific performance/cores/clock?
Or IPC x clock x cores x programming methods = program specific performance, where IPC is a constant and the only variable is the programming method.
That's the effect of CMT. Disable it, and you also remove the 20% performance hit you incur when using the second core of a module, so disabling every other core would RAISE IPC by avoiding that penalty.
Hence why it's really impossible to accurately measure it, as Palladin noted above. That's why I always note you can only do the math per-application, since various forces and processor usage are coming into play.
Likewise, FP IPC isn't the same as Integer IPC, which isn't the same as SSE2 IPC, and so on. Then you have cache dynamics, memory dynamics, and so on. In short: IPC isn't some flat number.
I tested this theory and it only improved single core performance by around 8% guessing you would have to do this on the hardware level to get 20% gains.
I believe I have an explanation for that.
If two threads are independent then scheduling them on cores on different modules eliminates the CMT penalty (~20%), but if threads are data dependent then scheduling them on different modules add a performance penalty from the cycles wasted on copying data from the L2 cache of one module to the L2 cache of another module. Thus disabling cores and scheduling threads to different modules doesn't always increase performance by ~20%.
You also have to consider Turbo, which occurs more aggressively when using only one module, versus two.
So yeah, it gets complicated, which all goes back to IPC not being a flat number,
You have to account for turbo in all the models. A review with a stock Intel chip leaves turbo enabled, and you simply don't know what clockspeed the chip is running at for the duration of the benchmark.
If 4770k is 3.5ghz base, 3.9ghz turbo, and it is running on an open test bench with great ventilation and a large after market cooler, who knows if it ever drop below 3.9ghz? That's an 11.% variance in potential frequency the chip is running at.
If you want to calculate IPC, you need to find test data that has turbo disabled completely and the chip locked to a specific frequency.
You can see clearly x87 instructions per clock in SuperPi for Piledriver are absolutely abysmal. wPrime though, Piledriver is only 30% behind in IPC assuming 4 intel cores = 8 AMD cores. And as it has been noticed, the core scaling is vastly different between the two so it's difficult to tell. I don't think it's possible to extrapolate single core performance.
Then again, all of this relies on the assumption that both CPUs are being fed the same instructions. There's too many variables that most reviews don't account for. And even if your math is right, if you're using the wrong data, you will get bad results.
If someone asks you what 2 + 2 is, and you tell them 7 + 8 is 15, you did your math right, you just used wrong data. That's what happens way too often with all these IPC calculations. I simply don't think it's a feasible measurement of performance.
You basically need the source code compiled in a fair way by a fair compiler that will feed both CPUs the same exact instructions with that program using every library compiled in a fair way as well.
I think that some of you are chasing something that's simply not worth the effort because there's way too many ways to have inconsistencies.
Are the CPUs turboing? Are they running the same instructions? Is one being gimped by a compiler? Do they have OS performance patches or changes?
Here's the thing. If someone did do something like compile Gentoo from source completely with the most generic, comparable cflags between all test chips, no one would care. They'd ask you why you're benching an OS they won't use. And if you run Gentoo, you'll optimize as much as possible.
Which is why I don't think IPC is worth anything. The only benchmarks that should matter are stock with turbo enabled and then benchmarks where you compare chips with average overclocks to each other. It's just a dumb marketing metric that has so many flaws that I doubt any serious CPU engineer would take it seriously.
The basic performance equation of computer science is
Performance = IPC * Frequency.
Every CPU engineer knows this equation. It is routinely used to catalog different kind of architectures: scalar vs superscalar, speed demon vs brainiac...
Not only engineers work with above equation but they work with models that describe the IPC of a given architecture (real or virtual) as a function of several parameters. By changing those parameters engineers can improve a given architecture. In fact, most of the research in CPU architecture during last decades was toward developing new ways of improving IPC. And the research continues today. A recentest example was VISC
As several of us stated before, IPC is not a kind of constant of the universe. But all the variability that affects IPC values also affects performance: What performance was measured? Is instantaneous performance over an atomic block? Average performance for a given application? Average performance over a range of applications? Is performance with turbo enabled? with turbo disabled? What OS? What compiler flags? The computer case was open or closed?
Using your own arguments we would conclude that we cannot discuss performance of computers because there are many variables involved; we would conclude that performance is "just a dumb marketing metric that has so many flaws"; but both are incorrect conclusions because your arguments were flawed.
I am arguing that the only way to judge which CPU you should buy is to run the programs you will be running in the environment you will be using.
You are confused. The MIPS number is generally derived from a formula to calculate a theoretical maximum. Just like when you calculate graphic card memory bandwidth. And you're trying to take this formula and apply it in ways that just don't make sense.
You are trying to extrapolate raw performance in a general sense by using a small sample size with a ton of variables. That is exactly my problem with your way of doing things. And we've already seen in this thread, you can cherry pick to show whatever hypothesis you set forth.
Go back in time and cherry pick some examples to show that Pentium 4 is better than anything AMD can offer. You can do it. I've tried it before.
IPC is a meaningless buzzword. What do you mean by it, anyways? If I have two CPUs, one has FMA, and the other doesn't, and I run a benchmark that's nothing but adding and multiplying, does the IPC even matter? You don't ever address any of these issues.
If, in my theoretical situation, you have a CPU A with twice as much IPC as the other one, CPU B. CPU A has no FMA and CPU B has FMA. So CPU A must run two instructions to do FMA equivalent and CPU B can just run FMA as one instruction.
Do you see why? We have extensive proof of compilers not playing fair with software. Intel lost a court case for fixing benchmarks in their favor. Nvidia and AMD both left SYSMARK because Intel was manipulating things.
You have absolutely zero right to make the assumption that a benchmark or program is treating two CPUs equally when we have 10+ years of benchmarks and programs doing exactly not that. And your entire argument revolve around the assumption that all CPUs are running the same instructions.
You are muddling marketing speak and science so much. Of course a start up is going to throw the term IPC around to pitch their own product. Being able to discern between marketing and science is a valuable skill and it improves people's argument quality drastically.
As far as deriving performance from formulas, that's exactly what Nvidia did for the 4GB of VRAM on GTX 970. Ask them how that turned out in the real world! The old frequency x (busWidth / 8) * 2 / 1000 formula worked awfully poorly for that last 512MB.
So, my questions regarding your arguments are simple. Prove to me that performance derived from mathematical formula is 100% accurate (you can start by proving the last 512MB of VRAM on GTX 970 operates at the calculated speed) and then prove that benchmarks run the same instructions for different CPUs.
I'm simply arguing the only way to judge a CPU's performance is to test your desired applications in your desired environment. I've made my case why that's superior to math and marketing. Now make your case why your way is better.
Do you see why? We have extensive proof of compilers not playing fair with software. Intel lost a court case for fixing benchmarks in their favor. Nvidia and AMD both left SYSMARK because Intel was manipulating things.
This is a great quote: "a hello world in GL is around 5 lines of code; with Vulkan it's around 600". You guys need to see the QA of the video (1:02:00 mark).
How will the 390X perform compared to 295X2? Guesses/leaks?
Only leaks were the fake 3d mark and the most likely fake chiphell leaks, I would optimistically say on rumored spec it will be a little over 50% faster than 290x. Could be a good battle with titan x if so.
In this case it is 40% faster. Your 50% was close.
I guess 390X and Titan X will be neck to neck at 1080p but the Titan X will be faster at 4K. Maybe 30% faster?
In this case it is 40% faster. Your 50% was close.
I guess 390X and Titan X will be neck to neck at 1080p but the Titan X will be faster at 4K. Maybe 30% faster?
From what I've read, the 390X will have a monster bandwidth, so it will more than easily make up for the less quantity of VRAM. 4K image pages should be in the 3.X GB of usage, but that's under GDDR5 measurements. Since they should have higher bandwidth, my bet is that the transfer rate will be enough to not need so much caching or preloading. And giving it a second thought, I might have it backwards... Uhm...
In any case, from the leaked TFLOPs alone, the 390X should be a tad faster than the Titan X, thanks to it's monster bandwidth.
^^ Assuming a memory-bound application, I'd expect the same. I expect to see the two off them swap victories, and some will be significant simply due to the cards being better at different things. Or basically, a repeat of the old ATI/NVIDIA wars (ATI favoring faster memory bandwidth, NVIDIA favoring pure shader performance).
How will the 390X perform compared to 295X2? Guesses/leaks?
Only leaks were the fake 3d mark and the most likely fake chiphell leaks, I would optimistically say on rumored spec it will be a little over 50% faster than 290x. Could be a good battle with titan x if so.
In this case it is 40% faster. Your 50% was close.
I guess 390X and Titan X will be neck to neck at 1080p but the Titan X will be faster at 4K. Maybe 30% faster?
I reverse image searched that benchmark and it comes solely from an anandtech poster. And you are treating it as 100% legitimate fact. Please, don't do this.
It includes such gems as modifying code and compiler to increase or decrease cache misses or hits and to increase IPC that way. Basically, that graph is about compiler optimizations and extracting more IPC from a CPU with compiler as opposed to measuring the hardware.
So, I don't really think this is about a chip's IPC, and it's more about extracting better IPC via software. which sort of fits into what the critics of IPC, CPI, etc have been saying in this thread. It's a useless metric that's mostly dependent on the software instead of the hardware.
In this case it is 40% faster. Your 50% was close.
I guess 390X and Titan X will be neck to neck at 1080p but the Titan X will be faster at 4K. Maybe 30% faster?
From what I've read, the 390X will have a monster bandwidth, so it will more than easily make up for the less quantity of VRAM. 4K image pages should be in the 3.X GB of usage, but that's under GDDR5 measurements. Since they should have higher bandwidth, my bet is that the transfer rate will be enough to not need so much caching or preloading. And giving it a second thought, I might have it backwards... Uhm...
In any case, from the leaked TFLOPs alone, the 390X should be a tad faster than the Titan X, thanks to it's monster bandwidth.
This recent news seems to confirm my point:
The decision to go for an 8GB Fiji rather than the planned 4GB version was in part attributed by Nvidia’s Titan X 12GB card announcement. This is just the first part of the story. One of the main reason is that the card is expected to perform so well in 4K gaming, that the 4GB frame buffer could impose a serious limitation.
It includes such gems as modifying code and compiler to increase or decrease cache misses or hits and to increase IPC that way. Basically, that graph is about compiler optimizations and extracting more IPC from a CPU with compiler as opposed to measuring the hardware.
So, I don't really think this is about a chip's IPC, and it's more about extracting better IPC via software. which sort of fits into what the critics of IPC, CPI, etc have been saying in this thread. It's a useless metric that's mostly dependent on the software instead of the hardware.
Your link shows how "we can measure the cycles per instruction (CPI)" in section "2.1. Cycles per Instruction (CPI)" and uses as reference for that section a blog article that just agree with everything what I have been saying:
The cycles per instruction metric (sometimes measured as IPC – instructions per cycle) is a useful ratio and (depending on CPU type) fairly easy to measure. If the measured CPI ratio is low, more instructions can be dispatched in a given time, which usually means higher performance.