AMD CPU speculation... and expert conjecture

Page 736 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

jdwii

Splendid

Stop killing the messenger he is nothing more then repeating what some people in the leading industry have been saying.
 


People like us by new dGPUs. The majority of people keep their aged PCs about a decade too long, then buy a new one.

dGPUs are going to be squeezed by APUs, since the low end market where most of those sales are going to get eaten up.
 
In regards to ICC, unless you have an app this is EXTRAORDINARILY sensitive to performance, no one uses it. MSVC is king on Windows, GCC is king elsewhere (though LLVM is going to displace it; it's a much better development environment). I've used ICC once, and it was on a project that actually cares about 5% performance gains.
 


Yeah APU's are taking over the low end market, soon there won't be any sense in buying something like the 730 or 740. I expect dGPU's to maintain popularity amongst gaming groups, which is a growing market segment. To be perfectly honest high end gaming is going to continue growing because those of us who grew up in the late 80's and 90's now have the economic freedom to purchase these systems for enhanced gameplay. Younger generations will grow up around them and follow suit. It will become like performance / luxury cars. Even though cheap economy cars exist that can easily perform the work of being a beast of burden, there is always a market segment who wants something faster or better looking and is willing to spend for that premium.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
blackkstar mentioned a recent discussion in "semiaccurate forums". This is the relevant part about K12/Zen
2016 AMD Server
-Up to 32Core/64Thread K12
-Up to 32Core/64Thread Zen

K12 Target Performance: 2x~ Cortex-A57

Project SkyBridge on server
-Pin compatible,Common interface in SoC
-Same core size,same core shape
-Logical and Physical design level compatibility

2017 AMD HPC APU
-Next Generation FirePro and Opteron
-200~300W TDP

via PC Cluster Consortium on Japan
It is not clear if "K12 Target Performance: 2x~ Cortex-A57" means clock per clock or means at target clocks (28PL vs 14FF). I have computed possible SPECint_2006 scores for each case

Assuming K12 has 2x IPC of A57
Seattle (8 A57 cores @ 2GHz): 80
E5-2470 v2 (10 IB cores @2.4GHz): 320
Thunder-X (48 cores @2.5GHz): ~350
K12 (32 cores @3GHz): ~960
Vulcan (16 cores @3GHz): ~640

Assuming K12 is 2x faster than A57 at target clocks
K12 (32 cores @3GHz): ~640

Zen would have about 10% less performance than K12.
 
Zen news: http://www.phoronix.com/scan.php?page=news_item&px=AMD-Zen-CPU-Znver1

I think this bit is interesting:
This patch reveals the AMD Zen design no longer supports TBM, FMA4, XOP, or LWP ISAs. Meanwhile the new ISA additions are for SMAP, RDSEED, SHA, XSAVEC, XSAVES, CLFLUSHOPT, and ADCX: ISAs are supported. It's nice to see with Zen that AMD will support the RDSEED instruction, which Intel has added since Broadwell for seeding another pseudorandom number generator. SMAP is short for the Supervisor Mode Access Prevention and is another Intel instruction set extension already supported by Linux.

AMD Zen also adds a new CLZERO instruction. This is a new one and "clzero instruction zero's out the 64 byte cache line specified in rax. Bits 5:0 of rAX are ignored."

Cheers!
 

con635

Honorable
Oct 3, 2013
644
0
11,010
If the above 390x benches are correct it may have titan x beat. eg bf4 4k titan x 27% faster than 290x @ 4k (techspot review), we'll see with more reviews.
edit power and thermals same as 290x, reviewer was impressed :/ ???
 
http://www.techspot.com/review/977-nvidia-geforce-gtx-titan-x/

Busy compiling some numbers; will edit shortly.

EDIT

Based on 2560x1600 results:

Compared to GTX 980: 35% faster on average
Compared to 290x: 44% faster on average
Compared to 295x2: 11% slower on average

Which makes this card DOA, since the 295X2 is selling just under $700 right now. Granted, it avoids the problems inherent in SLI/CF, but that isn't worth a $300 price premium.

EDIT2

http://www.extremetech.com/extreme/201346-nvidia-geforce-gtx-titan-x-reviewed-crushing-the-single-gpu-market
 
To be 100% fair with the Titan X, it's just a "bragging rights" kind of card. It's performance is top, but at a steep price. I need to see compute numbers in detail. From nVidia's presentation, it's numbers are quite good. Especially perf/watt.

In any case, you're a thousand times better going for the 295 from AMD (single slot, longest card) or SLI'ing 980s. The only scenario someone *might* want a Titan X, if there is not enough room anywhere in your house/apartment to put the 2 other choices, haha.

All in all, I expected more in terms of gaming.

Cheers!
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Hum, then do you also think that R9 390 and R9 390X WCE are DOA? Both will be priced above 295X2
 


nVidia always overprice anything with the 'Titan' moniker on it. I expect a considerably more price competative '980 Ti' card to drop once the 390X comes out. That said it the reported specs and numbers pan out the 390X looks like it will be faster than full GM200, so aside from clock bumps I can't see nVidia leapfrogging them this time like they did with the 780 ti just after 290X. Obviously nothing can be known for sure until we see what the 390X actually looks like in real benchies though...
 


I was really expecting more. It has 12GB of GDDR5. TWELVE. and a whooping 8B transistors to brute force anything. At least I was expecting it to put the 295 to shame. Hell, the 980 is 50% of the price for 25% less performance (25-35% less?). You pack 2 of them and blow the Titan X away. Specially since in 4K they're not far apart. And that was the whole point of it (for games).

But, like I said, this justifies the price tag alone. It's not a games-first card, like the previous Titan was, and that is a remarkable jump. Look at the compute numbers. AMD need to pull a rabbit with the 390X to be on par there.

Cheers!
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Those 12GB are targeting pro people who will use this card for more things than playing games. Games will not max the 12GB buffer, therefore expecting gaming performance to scale as VRAM buffer size doesn't make sense.

The 295X2 has 12.4B transistors. How many transistors do you expect for the 390X?
 

8350rocks

Distinguished


In several of those benchmarks, the 295X2 is virtually the same performance as the 290X, which leads me to question if the benchmark supported dual GPUs.

Additionally, there is one benchmark that typically favors AMD that is completely missing the 295X2 for comparison, and we know the 390X is stronger than the 295X2 to this point...so curious to see what breaks down there...
 

8350rocks

Distinguished


Minimum? 8 bil.
 


Thanks for ignoring the second paragraph where I say the exact same thing you do in your 1st line. And for 4K, it appears the 4GB from the 980 is not the limiting factor at all. It could be the GPU is not good enough (yet) for 4K gaming, so having 12GB was a great indicator of "hey, look, 12GB, it will smoke 4K!". But nope.

And the 390X, according to the FLOPs leak, I'd say 8.x Billion. Should be within hitting range of the Titan X.



I don't know. In my mind the 390X is within strike range of the Titan X. You'll see me very surprised if the 390X takes the crown away from the 295 and beats the Titan X handily.

Cheers!
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860
I see where the problem is now. There is confusion that hardware IPC = software IPC. I don't consider software as being a measure of IPC, as I stated before, you chan change how software interacts with hardware by changing the instruction count. Software is simply performance, not a measure of IPC unless you know exactly its program.

Lets take the formula for IPC as stated by wikipedia. IPC is the inverse of CPI = Ʃ(CCI)(IIC)/IC. Where IIC is the number of instructions for a given instruction type, CCI is the clock-cycles for a given instruction type, IC is the total instruction count. The summation sums over all instruction types for a given benchmarking process.

Here where its flawded to assume software performance -> hardware IPC. can you tell me what the IC is for every program used to prove INTEL >>> AMD? no, you cannot. you also cannot just assume that INTEL IC = AMD IC.

heck, even sse2 vs sse4.2 will have a different CPI due to the fact that IC is changed and depends on how fast sse2 vs sse4.2 clocks.

program performance ǂ hardware IPC. software cannot be calculated for IPC unless you know the exact instruction count, hence these programs are called synthetic benchmarks, because they know how many instructions are used for the task they are calculating. no one knows how many instructions there are for a specific game without examining the code itself to make sure that AMD IC = INTEL IC.

INTEL wants everyone to believe that software IPC = hardware because thats how they make sales. they didn't want anyone to know that AMD was given more instructions for any given task, artificially slowing down the AMD processor itself by adding instruction counts.

remember how starcraft II supposedly wasn't compiled with ICC?

graph.png


this is an amd 8350 fx cpu with a faked intel cpu-id flag

The useful work that can be done with any computer depends on many factors besides the processor speed. These factors include the processor architecture, the internal layout of the machine, the speed of the disk storage system, the speed of other attached devices, the efficiency of the operating system, and most importantly the high level design of the application software in use.

For users and purchasers of a computer system, instructions per clock is not a particularly useful indication of the performance of their system. For an accurate measure of performance relevant to them, application benchmarks are much more useful. Awareness of its existence is useful, in that it provides an easy-to-grasp example of why clock speed is not the only factor relevant to computer performance.
 

jdwii

Splendid


http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_Titan_X/27.html
Uses less power then a 290X while performing around a 295X not bad.
http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_Titan_X/29.html
Price is just what they think they can get out of it..


Edit i expect a 8 or 6GB GDDR5 980Ti to be priced at a cheaper amount but still have similar performance to this titan like the 780Ti thing.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Zen supports FMA4.



Right, about same number of transistors than the Titan X. About performance just add 40--50% to 290X scores and you will obtain 390X performance

http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_Titan_X/29.html

Both Titan X and 390X will be neck to neck (within 5%).
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860
Titan x is rated 6.6 tflops, the 8gb 390x is rated for 8.6.

I think for the most part the 390x will slap nvidia till volta/pascal, except the games that heavily favor NV hardware. Biased software is still too much to overcome.

 


No, Hardware IPC is the THEORETICAL MAXIMUM a chip can reach for a given series of instructions. You'll NEVER hit this number, because you'll never have perfectly optimized software.

Simple example: The Cell CPU can reach a theoretical maximum of 240 GFLOPS, but it's typical throughput was limited to about 160 GFLOPS.

Point being, no one here is confusing the two. What matters is the IPC you can extract via software.

Lets take the formula for IPC as stated by wikipedia. IPC is the inverse of CPI = Ʃ(CCI)(IIC)/IC. Where IIC is the number of instructions for a given instruction type, CCI is the clock-cycles for a given instruction type, IC is the total instruction count. The summation sums over all instruction types for a given benchmarking process.

Correct.

Here where its flawded to assume software performance -> hardware IPC. can you tell me what the IC is for every program used to prove INTEL >>> AMD? no, you cannot. you also cannot just assume that INTEL IC = AMD IC.

That's why you measure on a per-application basis, and look for trends. Different instruction sets and different usage characteristics are going to give different results. Something that's heavily cache based, for instance, is going to SUCK on AMD because of their cache latency. But pure integer performance? Much closer grouping. But if you take a lot of benchmarks that do different things, you can eyeball the results and make conclusions like "Intel is about 40% faster per clock in all tasks that are not integer math heavy".

heck, even sse2 vs sse4.2 will have a different CPI due to the fact that IC is changed and depends on how fast sse2 vs sse4.2 clocks.

Exactly. And Intel and AMD implemented the HW resources needed to execute these commands differently, so code that is produced by compilers has to be somewhat aware of what architectures its going to run on. What works for one architecture may suck performance in another.

Hence why compiler updates will gradually improve performance, since over time, the code generator will be optimized for certain quirks in various CPU architectures.

program performance ǂ hardware IPC. software cannot be calculated for IPC unless you know the exact instruction count, hence these programs are called synthetic benchmarks, because they know how many instructions are used for the task they are calculating. no one knows how many instructions there are for a specific game without examining the code itself to make sure that AMD IC = INTEL IC.

That's why I AWLAYS stress the IPC numbers I come up with are really only good for comparisons, and are not real IPC numbers. Nevermind the calculation assumes 100% CPU loading, no interruptions by the scheduler, and a host of other assumptions that will never be true. But for high level comparisons? The formula's "Good enough" to get a reasonable comparative difference between various architectures.

INTEL wants everyone to believe that software IPC = hardware because thats how they make sales. they didn't want anyone to know that AMD was given more instructions for any given task, artificially slowing down the AMD processor itself by adding instruction counts.

AMD is free to implement their hardware implementation of the X86 instruction set any way they want. And they really have no excuse on AMD64, since they created the ISA in teh first place. Likewise, both AMD and Intel have their own extensions that do different things different ways, which often don't work on the competing ISA. There's no artificial slowdown, just sub-optimal processing of various instructions.

remember how starcraft II supposedly wasn't compiled with ICC?

It's not. According to all three tools I use to determine what the compiler was, the Starcraft II main executable was compiled with MSVC 2008. So unless you're claiming ICC is replacing it's compiler ID with that of MSVC, I can say with certainty that Starcraft II was not compiled with ICC.


Expected for an executable that only uses two heavy threads, you'd expect an Intel advantage until unit counts basically thrash both CPUs to holy hell.

this is an amd 8350 fx cpu with a faked intel cpu-id flag

Which again, is what I'd expect given the IPC differences on the chips. SCII uses two REALLY heavy threads, so Intels IPC edge is going to be far more important to performance then AMDs core advantage. Farther, spoofing an Intel chip is probably costing AMD performance, since it's very likely taking a sub-optimal code path for the BD architecture. Hence why I note the total absence of a baseline run looking at AMD performance prior to spoofing the CPUID field.

That being said, given MSVC 2008, it's possible there's no BD specific performance path in the compiled code to begin with, in which case it's likely falling back to the old Phenom II path, which is likely eating at least some performance. I have no way to know how updated the compiler was, so I can't say whether the AMD path is optimized for BD based chips or not.

So yeah, any other points you need me to disprove?

EDIT

Just going to throw this here:

http://graphics.stanford.edu/~mdfisher/GPUView.html

At least as far as the DX9 codepath is concerned, only two threads do any meaningful work within SCII, the main program executable, and one instance D3D9.dll, which is almost certainty the main render thread. The game is basically bound by single core performance.
 
Status
Not open for further replies.