AMD CPU speculation... and expert conjecture

Page 65 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


1: Simple test: Open Task Manager. Less guessing, more looking please. More advanced test: GPU View

2: Now THIS is why you learn math:

The clock speed difference between the 3960x and 8350 is ~17.5% in favor of the 8350 [lets assume no Turbo here, which COULD be a significant factor].
The performance difference between the 8350 and 3960x in Crysis 3 is ~10.8%.

Now, lets throw that IPC difference into the mix (FX looks to be about 80% as efficient as Intel clock for clock in most tasks.) That makes the clock difference difference, with IPC factored in, to 14%, only about 3% off the performance difference in game the benchmarks are indicating. Throw in Turbo, cahce size, and the rest, and you can probably find that last 3% performance.


Based on what I'm looking at, specifically, the jump between the 3570 and 3770 (+6 average FPS, +3 minimum), I'm guessing the game is coded to use HTT well, as you see significant performance increases with it enabled. Wouldn't know for sure without digging in though...If I had any interest whatsoever in the game, I would probably dig in and find out whats happening.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810



There aren't any on the road map for this year, but anything is possible for 2014.

The trend is definitely moving to higher core counts.

LSI just announced a 16 core ARM A15.
http://www.design-reuse.com/news/31455/lsi-axxia-5500-communication-processors-arm.html

Samsung has an 8 core ARM.
http://www.embedded.com/electronics-news/4404964/Samsung-reveals-big-little-8-core-ARM-for-mobiles

There are some major trade offs here. None of these are clocking anywhere near the 4Ghz of today's high end CPUs.
 
On Crysis 3:

0nIkCAb.jpg


Note the positioning of the FX-8150. In that case, its clockspeed and extra cores still leave it behind the 2500k, indicating IPC is driving performance. But then you get the FX-8350, with an extra 400MHz and about a 10% IPC boost, which pulls it ahead of just about everything.

Farther proof of this can be seen in differences between "like" processor groups: Look at the FX-4100 (BD @ 3.6) versus FX-4300 (PD @ 3.8). And 11 FPS jump based entirely on 200MHz and IPC improvements is significant (23%). Similar trend occurs in the FX-6000 and FX-8000 lineup as well, where minor clockspeed and IPC boosts give double digit increases to FPS.

Therefore, I conclude Clock + IPC is the main driver of performance.

EDIT

I'd also guess you'd see three cores being tasked to any significant degree (>50% usage) based on what I'm seeing, probably with a fourth core being used occasionally (<25% load), which would indicate three cores is where performance starts to maximize. Fourth core may help on SOME architectures (especially on slower CPU's), but I'd guess three is where most of the work gets done, which is about what you see in other games. [Anyone willing to confirm?]
 

Ranth

Honorable
May 3, 2012
144
0
10,680


So are you saying this game is only scaling to 3-4 cores? If so what about the 6100 vs the 4300?
 
so the bar keeps raising but the stance is the same. Impossible to use more than 2 cores. IMPOSSIBLE!!!! (except for that game, where its impossible to use more than 4 cores. IMPOSSIBLE!!!!)
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


That's the conclusion I was coming to. The latest Crytek 3 game engine is looking for a minimum of 3 real cores. It explains the massive drop off with the i3. It makes sense as the engine was designed from the beginning to work with PS3/XBox360/PC. It also shows that game companies need a lot of help to leverage the architectures. This has been in the making since 2009 at least.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810



No one said impossible. It's incredibly challenging as evident with this game engine being 5+ years in the making.

I thought AMD had a great idea releasing the X3 chips. They just never took advantage other than as a die salvaging technique.
 

gamerk said it was impossible and that it will never happen.
 

kettu

Distinguished
May 28, 2009
243
0
18,710


A faster architecture performs better even with less cores.



So the Piledriver architecture is fast enough to surpass 2500k with its eight cores.

 

Blandge

Distinguished
Aug 25, 2011
316
0
18,810


I've taken your reasoning into consideration and decided it's plausible, but still unlikely. Sure frequency plays a role in FPS, but how do you explain the difference in performance between the 25/600K and 3930K?
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860

Quad channel mamory or total ammount of l3 cache, also explains the i7 920s spot.

There are soo many factors in fps testing. Cpu speed, frequency, ipc, core count, memory (speed, latency, total bandwidth), l1, l2, l3 cache sizes and speeds, pci-e lanes and latency (on chip nb wins this)

That's why only picking a select few benchmarks doesn't give a full spectrum of capabilities.
 


I seem to recall saying something to this effect, over and over again during many discussions. That people need to take a look at the bigger picture and at total performance capability instead of how fast you can toss up frames from a single player looped time demo.

Anyhow this is what I've been telling people is going to be happening. Previously we were in an era of everything being coded primarily for two cores. Over the past couple of years we've begun moving into the era of four cores being optimal. Five or six years from now we'll be moving into six / eight (not sure yet on the scaling over time) core era. Coding for multiple targets requires the problem to be redefined in such a way that it can be done simultaneously. This redefinition is very hard to do but once you do it and document it you can move forward and implement that method in it's various forms on other code.
 

kettu

Distinguished
May 28, 2009
243
0
18,710


If you check out the difference between i5-760 and i7-930 it looks more like hyperthreading is giving the i7 a minor boost rather than the triple-channel vs dual-channel. A ~6% increase from 50% more memory channels doesn't indicate that the dual-channel memory is bottlenecking the i5.

The core count is the simplest explanation, and the numbers support that.
core i3 -> core i5 -> core i7 (6-core)
fx-4100 -> fx-6100 -> fx-8150
fx-4300 -> fx-6300 -> fx-8350
phenom II X2 -> X4 -> X6
core 2 duo -> core 2 quad

Perofrmance scaling seems to diminsh once you go beyond six but it's not completely absent.

And then there's this, note how beatifully the load is balanced on the six core Intel CPU. AMD 6/8-cores have more variation but I suspect that is due to the shared nature of the architecture. What was the marketing blurp for Steamroller? "Feeding the cores"? :)

http://gamegpu.ru/images/stories/Test_GPU/Action/Crysis%203/test/proz%20amd.jpg
http://gamegpu.ru/images/stories/Test_GPU/Action/Crysis%203/test/proz%20intel.jpg
 


Again: We don't code to any level of cores. We create threads, they do work, and the OS loads threads to cores however it wants. The fact the i3 struggles indicates a workload greater then two i3 cores can handle.
 


Interesting to say the least. I assume this is using the DX11 render? Might be a really good multithreaded rendering engine, though I haven't seen any that worked well yet...I'd be interested to see if its in fact the rendering threads, or game engine threads doing most of that work...

Was tossing around getting the game; might just get it to do a GPUView analysis and see whats going on under the hood...

gamerk said it was impossible and that it will never happen.

I said difficult to the point where you won't see devs undergo the work. I would like to see how many threads are actually doing work though [just to rule out REALLY shoddy coding]. Based on the above, you would assume at least 5-6 threads doing meaningful work, which is very atypical of the 2-3 you usually see.
 

truegenius

Distinguished
BANNED
i will say
clock all of them at 1ghz on all cores/threads (to create a cpu bottleneck and to force maximum cpu usage)
and then check the cpu time thing in taskmanager so as to find out the total cpu usage

for example
if an 8 core cpu show 5.1 minutes of cpu time out of 8 minutes (1 minute x 8 core) after 1 minute of bench
then it shows that scaling is 5.1/8 = 5-6 cores

isn't it a good idea ?
and i think that 8150 and 8350 will show identical reading in this test (provided that cpu is bottleneck)
 

kettu

Distinguished
May 28, 2009
243
0
18,710


I think DX11 is a fair assumption. Hmm, rendering threads, good point. Maybe, I don't know. If it's some kind of trickery to promote multi-core CPUs by running code that would be better suited for GPUs.. That would be lame. Or perhaps their engine is so heavily taxing the GPUs that offloading some of that rendering work to the CPU increases the overall performance. Or could it be some form of ray casting code that runs better on a modern multicore CPU than a GPU? What ever they did atleast they're putting cores to work.



In finnish we have an expression "kehitys kehittyy" literally translated it means progress progresses :) As a gamer and a follower of tech industry, I'm glad to see this happening.
 


Wouldn't work, because when you have only ONE application doing a lot of work, and that application is the foreground application to boot, its going to be running on at least one core more or less 100% of the time. Even if you ran for 24 hours, you'd see TM reporting close to 23H59M worth of CPU time.

The only way to see for sure what is going on at the thread level is a low level utility like GPUView, which allows you to see both the CPU Queue and GPU Queue. That will allow you to determine how many threads are doing meaningful work (above 5% CPU load or so), and how many are actually running in parallel (and for how long that remains true). [I plan to install the SDK tonight/tomorrow, and will probably take a cursitory look at a few games I have installed. I'll make a separate thread when I do that].



The "big" feature of DX11 is multithreaded rendering, which allows multiple threads to render to a single GPU. In theory, this allows you to get away from having one monster render thread, which would have to execute sequentially, which as you can imagine, kills the ability of the CPU to scale to any reasonable degree. Hence why more or less any pre-DX11 game (or any game that still has a DX9 path) doesn't scale well.

The issue is, the few times I had a chance to play with multithreaded rendering, due to how rendering engines were designed, there wasn't much you could make parallel, as the rendering code went through stages where each stage took the previous stages output as its input. Its possible Crytek managed to build a rendering engine that gets around this, which would allow significantly better scaling, but as of the last time I worked on a game engine, this wasn't the case. [If this IS what's happening, I'd be REALLY interested to see how they structured the program to pull that off.]

Then of course, theres possibility three: Really poor coding in a number of threads (EG: using a polling method, rather then messages, which kills the CPU). I doubt that, but really can't discount it either. [And I note, this would be detectable on GPUView.]
 

truegenius

Distinguished
BANNED
Even if you ran for 24 hours, you'd see TM reporting close to 23H59M worth of CPU time.
yes , it will have margin of error, but atleast it will give some idea of scalability of a game which a novice like me can understand :D

The only way to see for sure what is going on at the thread level is a low level utility like GPUView, which allows you to see both the CPU Queue and GPU Queue

:( currently gpuview is out of my scope i.e, my cpu (my brain) will hang :p

i didn't found any installer to install gpuview (i mean to click next next next finish :whistle: )

indeed i found something which says to do 010100101111010101010101010011000011 and hajhfiwbdisbfdsdqdbfjs to install it :( (yes it is not same as next next next and finish)

that is why i can not use gpuview :( (atleast now)

offtopic but still :(
i need to clock nb from 2ghz to 3ghz to get ~15GB/s speed (~8GB/s memory-copy at 2ghz) out of my dual channel rams
crappy nb :'(
i should have opt for dual channel 1333cl7 ram :'( instead of 1600cl9
and vengence is not good, sniper is better than vengence (atleast for my system)
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


An intelligent game engine can certainly profile the CPU and create a number of threads optimal for the core/thread count. Like it makes no sense for an divx render to create 16 high workload threads on a 4 core cpu. It will be cache thrashing.

What the core/thread charts for this game shows is that HT gets hardly any use, where AMDs CMT cores are heavily leveraged. This means the high workload threads are getting a lot of cache hits for AMD, and the Bulldozer architecture is allowed to shine.
 


NO ONE CODES LIKE THIS. Period. No one is going to create entirely different game engines for different core counts (because that is EXACTLY what you are advocating). Its stupid, idiotic, and doesn't gain you anything. Taking that argument to its extreme, I suppose we should all be making a single threaded path for all titles, for those still running single core CPU's!

Any work that is reasonably parallel is put into a separate thread. Period. We've been doing so since the 80's on single core PCs. Threading has NEVER been an issue, because its trivial. Heck, even cache thrashing is less a problem now that most CPU's have a very large L3 cache that all cores have access too.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810



I think you're taking the thread counting too literally here. I'm referring to only the highly parallel threads not every thread the program uses. Being core/thread aware doesn't mean you have to code entirely different engines. 3DSMax doesn't just choose an arbitrary number of threads to spawn, it profiles the CPU resources and runs accordingly. Same with any other SMP aware application.

To the extreme look at a Tilera 72 core processor. They can exactly lock an application to a block of processors.
 
Status
Not open for further replies.