AMD CPU speculation... and expert conjecture

Page 220 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


Aww, I prefered Kagamine Rin, not a fan of Pigtails! Anyways, that could be a good benchmarking tool.
 

Well, if you pay me 1000$ for the Titan, I can fire up the old QX9770 rig and see where it lands. The Q\QX9650 should roughly be in the same spot as a 955BE\965BE if I recall. I have a friend running two GTX 660(s) in SLI with a 790i and a QX9650@4GHz (He upgraded everything but the 775 platform :lol:) and it is extremely playable (40-60 FPS) in BF3 @2560x1440, so it can still hold its own with quite powerful setups.
 


True. But it kinda says something that a C2Q, in a CPU intensive title, would still be hanging somewhere around a PD based FX chip (FX-4320). Which goes back again to my primary argument: IPC + Clock is more important to drive performance then the number of cores.
 

8350rocks

Distinguished


Remarkable that the P2X4's still are as well isn't it? It goes to show just exactly how far behind hardware the software actually is...since it must not be taking advantage of the newest instructions which are unsupported on the older chips.

Amazing revelation!!! Software is far behind hardware and affects performance!!! What a thought?! Wait a minute, I said that about 40 pages ago...
 


Despite the fact CoH2 is yet another example of a "well threaded" game? the PIIX4 hangs around because of its superior IPC. So in a game running 9 threads at a time, quad core processor with superior IPC performs the same as the octo core processer with lesser IPC. In a game running 9 threads. Which according to all the armchair software people on this forum, is supposed to lead to significant performance improvements on octo core chips, because its "well threaded".

As I've explained: We've been trying to make software parallel since the 70's. And every, single, time its been attempted to write software that scales, its found that the performance never scales the way that's expected. Massively parallel datasets scale; that's why we do video encoding with OpenCL now. That stuff goes to the GPU. The rest of it WILL NOT SCALE, for reasons I've been explaining for about 2-3 years now. And now, we have the benchmarks that show, even in "well threaded" cases, single core performance is more important then the number of cores. We see it in Crysis 3, which runs on 9 threads, and CoH2, which runs at least 9: Single core performance tends to be more important then the number of cores.
 

8350rocks

Distinguished
Your cited example runs everything through the first thread before execution...that's not a very well threaded example. You would have to have threads that were executable without having to check with the primary thread. The Essence engine is poorly optimized...I know...I deal with game engines daily. The best possible scenarios to build a game on right now are going to be:

CryENGINE 3
Unreal 4
Frostbite 3
Ogre 3D

All of those, consequently, can be tuned up to be so demanding that modern hardware could not run the game...if you wanted to make it that ridiculous. Honestly...considering that CryENGINE 4 is in the works...I am not even sure that the next generation of console ports won't be demanding on a mind-bending scale on PC. Unreal 4 also has yet to really be rolled out in a AAA title too. These engines are all waiting on the console hardware, and the first generation games on those engines will be a drastic departure from this generation.

The current project I am working on started out with an eye on CryENGINE 3, and frankly, because it's a MMO, we couldn't do it, because we would be too demanding of our target audience. We settled on the engine about 6 months ago, but frankly...it won't be something you'll enjoy playing on a dual core, even on a more dated engine.

I think people on the outside looking in don't realize that the next generation engines, that will be as demanding as the original Crysis when it first hit, are waiting in the wings. If you think CryENGINE 3 is demanding, wait for CryENGINE 4, that's been watching the 8 core CPUs come to desktop and now arrive in the consoles...

EDIT: Also, considering the Essence Engine is RTS, it doesn't at all surprise me that it's still driven by single core performance...though that means little because RTS games can play fine at 25 FPS.

For you Linux gaming hopefuls out there, CryENGINE 3 is being ported to Linux...just thought I would put that out there for you. A top notch AAA title could come to Linux soon once CryENGINE 3 is ported.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Maybe double performance per clock is 'impossible', but your previous claim that double performance was due to twice clock rate is 'trivial'. Maybe real improvement will be in the middle between both extremes.

I don't consider 30% IPC too high, because Bulldozer arch. has room for improvement due to being highly constrained. Moreover, your 15-20% comes from Toms belief/hope, not from official AMD info.

AMD claimed 30% in 2012 and if Feldman now means double performance per clock then the increase is going to be >30%. Some weeks ago Feldman claimed "tremendous" performance for Steamroller core and AMD is substituting 8C PD CPUs by 4C SR CPUs. It is difficult to believe that SR was only a mere 15% faster.



This was noted before. It was also noted that someone at intel devs. forums was building 7-zip from source using ICC.



This is an example of a poorly threaded game. The difference between the i7-3770k and the i5-3570k is explained in terms of clocks. Also the six core 3960x and 3970K got poor performance than the four core i5, because this latter is clocked higher. Similar claims about the FX series.
 

os2wiz

Distinguished
Sep 17, 2012
115
0
18,680


Who says the FX Steamroller will be coming from Global Foundries? AMD has lessened GF's role and increased their involvement with a Taiwanese-based foundry.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780

http://fudzilla.com/home/item/31621-amd-demonstrates-kaveri-apu-at-computex-2013
Keyword: "according to AMD"


It's easy, look at me :)
In fact I analyzed CPU reviews from last 10 years and must say that even +20% IPC is quite high.
And remember that I'm writing about average case, with no doubt there will be few benchmarks where SR will perform much better than only +20% (also few where it will perform only slightly better than PD).
 


CoH2 is a WELL threaded game. Runs 9 main threads according to the devs, and on my PC, I confirmed that last night via GPUView. The game is very well threaded.

That simply highlights the point I've been trying to make for over two years now: More threading != More performance.

Look, at the end of the day, the CPU is either getting all its work done, or its not. If it is, then adding more cores will not add ANY performance, just lower overall core usage. You see this effect in Crysis 3; IB and PD generate the same FPS (likely GPU bottleneck), even though PD's core usage per core is almost half what it is for IB. Even though IB's core usage is high (close to a CPU bottleneck, but not quite), its still getting all its work done, so adding more cores wouldn't improve performance, just lower the average usage per core.

When not CPU bottlenecked, Clock + IPC is the limiting factor in terms of performance for that application. The more instructions you can get though, the faster the application runs. Hence why the higher IPC/Clock chips are at the top of the charts. Not shocking in the least.
 

8350rocks

Distinguished


Even well threaded RTS games will still heavily favor one core because the "main thread" does 70+% of the heavy lifting. That's the way RTS games operate.

RTS has come a long way from where it was, but it still hasn't gotten to the point that CryEngine or Unreal are at in terms of threading...(less so with Unreal 3, more so with Unreal 4, CryEngine 3).

Either way, you went looking for the example out there that tends to be the exception not the rule on better threaded game engines. Notice when I spoke about better threaded games I never mentioned Essence Engine, or any RTS engine for that matter?

Also, most RTS titles can run at 20 FPS and still feel smooth comparatively, because they just don't require 60 FPS. Look at Civ5 and some of the others out there, they're beautiful games, and they look smooth with FPS numbers in the 20's.
 


That's the way ALL games operate. The main program thread is always going to be doing at least 50% of the total workload; its the driver for the rest of the program.

RTS has come a long way from where it was, but it still hasn't gotten to the point that CryEngine or Unreal are at in terms of threading...(less so with Unreal 3, more so with Unreal 4, CryEngine 3).

Based on thread data, CoH2 is more mutli-core friendly then any title on Unreal 3, and equal to the only title currently released on CryEngine 3.

Either way, you went looking for the example out there that tends to be the exception not the rule on better threaded game engines. Notice when I spoke about better threaded games I never mentioned Essence Engine, or any RTS engine for that matter?

Lets see, the argument for the past two years has been "More threading = more performance". Then I show two titles (Crysis 3 and CoH2) that scales well to multiple cores, but does not lead to more performance. Then you complain about cherry picking, without putting forward your own examples.

And you again ignore the main point I've been trying to make: If all the CPU cores are getting their work done, how does adding more cores increase performance? The answer, as now two separate titles are showing, is it DOESN'T. As long as the CPU is not bottlenecked, performance is driven by single-core performance, as adding more cores does NOTHING to increase performance.

And if you have a title that refutes this, please bring it forward now.
 




*Digs up 4GHz E8400 rig*
 
Let me give a simple example here. For purposes of this discussion, lets pretend perfect scaling can actually happen:

1: You have a single core CPU. Your program has two threads, each which take 100% of the CPU's resources. In this case, if I added a second core, I can reasonably expect a doubling of performance, due to twice as much work getting done over the same time period.

2: You have a single core CPU. Your program has two threads, each which take 40% of the CPU's resources. In this case, if I added a second core, I can reasonably expect performance to remain the same, due to the same amount of work getting done over the same time period.*

*Technically, program latency would improve somewhat, but absolute performance metrics would be the same. Minor point, but one worth noting.

Now, if we were still in the Pentium 4 era, where games were limited by the CPU, then yes, you could reasonably say that performance would be dominated by the number of cores available to you rather then single core performance. But today's games are RARELY CPU limited; the CPU is sitting around for periods at a time waiting for the GPU to finish the current frame. As a result, the GPU is the systems limiting factor, and performance on the CPU is driven only by how fast each individual thread gets completed, which in turn is driven by single core performance.

You see this effect in Crysis 3: The core i3 sits around with both cores at 100%. It simply can't handle the workload, and as a result, even lower end quad core chips (PII X4) show close to a doubling in performance. (See item 1 above). Beyond that critical point however, every CPU on the list is getting its processing done faster then the program can give it to the CPU, so from that point forward, performance becomes driven by single core performance (See item 2 above). Hence why a FX-4300 outperforms a FX-8150: IPC + Clock, even as minor improvements, are far more important then those four extra processing cores.
 


Aww.. but we don't want those poor 6800Ks going to orphanages ;)
 


I SPECIFICALLY said that performance becomes dominated by core performance AFTER the point where you escape the CPU bottleneck. The higher IPC i3 line, for instance, holds up far better then C2D's, despite still only having two cores. But in some titles, those two cores still leave the i3 bottlenecked, as the total amount of work that needs to be done is more then two high IPC cores can handle. But in other programs? It outperforms even the FX-8150.

We really need to start getting %usage statistics alongside FPS benchmarks. Without that, its harder to determine if a current CPU is bottlenecked, or just slow. Though you can *usually* infer which is which; in CoH2, it looks like the FX-4100 is the first CPU on the list that is not CPU bottlenecked. From that point forward, performance is driven by core performance, rather then the number of processor cores.
 

8350rocks

Distinguished


You want examples of how more cores equates to more work?

Ok, go find any Crysis 3 benchmark...(your own "example")...and look at top end i3 numbers. If your logic is correct, the i3 will perform the same as the 3570k.

CPU_03.png


Oh, look! The i3, which is 100 MHz faster than the i5-3470 suffers a loss of 24 FPS in the title you are citing as an example that more cores equals the same performance as less cores. I think you're being a bit obtuse, don't you?

Additionally, by your same logic, the P2X2 560 @ 3.3 GHz should be running right along side the P2X6 1100T @ 3.3 GHz, yet it's clearly 26 FPS lower.

So, now that we have a baseline...and have established that the engine's multithreading capability vastly determines the effect of more cores on greater performance...go ahead and tell me how it is that software cannot be more multithreaded in better ways and take advantage of newer hardware's capability when it's programmed more effectively.

As I said earlier, Essence engine is a good attempt at multithreading, but RTS titles don't particularly lend themselves well to multithreading. The issue with those types of games is that the main program thread does significantly more work than it does in other titles from different genres where the main program thread is not so heavily leaned on to do the vast majority of the work!

EDIT: Those comparisons are even within the same generation of architecture from their respective companies...so if you're correct, then they should be doing roughly similar frame rates...though the more cored solutions are running nearly twice the FPS.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I have bolded above that I am mentioning Bulldozer. Your Fudzilla link does not explain what baseline are taking, but I assume that they did mean Piledriver

Piledriver gives about a 8% IPC over Bulldozer

http://semiaccurate.com/2012/11/05/amds-bulldozer-core-compared-with-piledriver/

Therefore a 30% IPC over Bulldozer implies about 20% IPC over Piledriver (1.2 x 1.08 = 1.30).



Nope. It is not well-threaded because above benchmarks show how a i5 (4 threads) gives more FPS than six core i7 (12 threads). In a well-threaded game the 12 threads i7 would be faster than any other chip there including the (8-threads) FX-8350.

I am only guessing that the engine overloads one or more cores and leaves partially loaded (probably under 60% load) the rest of cores, generating a CPU bottleneck. This could explain why a $250 four core chip outperforms a $1000 six core chip.

That game is neither representative of well-threaded games, neither of future games.
 

griptwister

Distinguished
Oct 7, 2012
1,437
0
19,460
I see what gamerk316 is getting at now. When he said "AFTER the point where you escape the CPU bottleneck" He was talking about the minimum/maximum amount of cores used. So From what I translated is that, 4 cores would be the minimum/maximum amount used to escape a bottleneck, so having those cores being more powerful, it makes for better FPS. Which in the current situation is true. But...

When you have the load distributed evenly and properly, a v8 will pull away or equal a v4. (reffering to cars again) xD
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860


the game is designed around latency and cores.

CPU_05.png


CPU_01.png


2.5 ghz 4770 is faster than a 3.2 ghz 3930k so the theory of clock speed fails.

Moreover, the game seems quite CPU dependent, as we saw a 42% performance increase when going from 2.50GHz to 4.50GHz, though that is an 80% jump in clock speed.

so clock speed is mostly ineffective wich means ipc is the same.

this isn't a case of "ipc is the only thing that matters" Intel have lower memory and pci-e latency.

According to Relic Company of Heroes 2’s large memory footprint made the latency of transferring data between GPUs untenable. This in turn meant that an alternate frame rendering solution (like Crossfire or SLI) may not produce a higher frame rate as the GPUs would be bottle-necked by the latency of PCI-E bus.

from the s/a article on coh2.

http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive/4

sb-e has a higher latency than the 1155 cpus. explains its place below IB.

this game warrants some memory speed testing to see how it reacts.

well threaded or just latency dependent?

AIDAL.png



the interesting one is still the I3 ... even with its low latency, it can't keep up with any quad core cpu aside from the athlon x4

Dual core gaming is going to die.
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860


same story on the xbo. http://www.tomshardware.com/news/Xbox-One-Clock-Increase-Radeon-Family-Sharing-Marc-Whitten,23511.html

looks like the ps4 is the same thing, dev kits will get a higher end system. I wonder how long it will take to see aftermarket ps4 / xbo coolers and oc procedures.
 

GOM3RPLY3R

Honorable
Mar 16, 2013
658
0
11,010


Even so, the i7-3930k has the same ~ per core performance of a 3770k and 3570k, does it not? Thus, even if the clock is lower, since it has 2 more cores (4 including HT) the overall performance will be greater. Plus, most people overclock them to 4.5 usually, which for a 3930k, is way more than enough. A 3930k can re-encode a 1080p 30 minutes .avi file in ~ 5 minutes, stock clock.

Also, with this boost? I can only assume you mean boost clock. Thus, that is literally no different then changing the clock speed on the CPU manually. It's actually less since it'll only "boost," rather than hold at that clock.
 

GOM3RPLY3R

Honorable
Mar 16, 2013
658
0
11,010


Even so, the i7-3930k has the same ~ per core performance of a 3770k and 3570k, does it not? Thus, even if the clock is lower, since it has 2 more cores (4 including HT) the overall performance will be greater. Plus, most people overclock them to 4.5 usually, which for a 3930k, is way more than enough. A 3930k can re-encode a 1080p 30 minutes .avi file in ~ 5 minutes, stock clock.

Also, with this boost? I can only assume you mean boost clock. Thus, that is literally no different then changing the clock speed on the CPU manually. It's actually less since it'll only "boost," rather than hold at that clock.
 
Status
Not open for further replies.