Core Scaling on Far Cry 2.

amdfangirl · Apr 16, 2009

http://www.techreaction.net/forums/showthread.php?p=3514#post3514

Core scaling has come a long way since the days of the first Athlon X2, game code has improved to such a point they take advantage of CPU cores. However, this improvement is still not enough to challenge CPUs double the frequency.

r_manic · Apr 17, 2009

But how come I always read stuff saying software (even games) don't take full advantage of multi-core CPUs, etc etc? I think we should hold off on considering them mainstream until mainstream OSes are actually built to use 'em.

amdfangirl · Apr 17, 2009

They are what most people buy these days so they are mainstream.

I dunno, its just a copy &paste.

d_kuhn · Apr 17, 2009

Coding for multiple processors is a HUGE paradigm shift for game coders. I've been doing it for over a decade, but these guys have been concentrating on optimizing linear coding and now everyone is expecting them to leverage 2, 4 or more processors. It'll take time for these guys to build understanding of how best to accomplish that.

hundredislandsboy · Apr 17, 2009

Many mainstream PC enthusiasts such as myself also have other graphics-related hobbies besides Far Cry 2 and Crysis. I love the Spectate Mode in Crysis just to see the beautifully rendered textures from all angles.
I also do much photo and video editing beginning with the days of the Celeron and Pentium !!'s. Believe me these mainstream apps are coded to use as many cores as possible and cuts the render times of editing 10 MB photo file and 30 minutes of a DVD video to a quarter or even more than they were just a few years ago!

JAYDEEJOHN · Apr 17, 2009

I find it interesting alot of people will claim that games are gpu limited. This shows the limitations we are seeing in MT, if used at all, cpus, and the wall theyve hit in IPC/clock speeds Im thinking things may some day change radically, since in order to truly take advantage of MT, games wont be the best example, and alot of apps wont comply easily either

sub mesa · Apr 17, 2009

d_kuhn :

Yes up until now all games were like:

While ( no error)
{
do cpu work
do gpu work
} // repeat as fast as we can

The result of this single-threaded "as fast we can" behaviour is always 100% CPU load on one core, and no usage on additional cores. Operating systems may switch the load to another CPU core and do so frequently, but never can two cores be doing work at the same time so there is no performance improvement from using dualcore or quadcore over a single core system.

As more modern server processes do often, is the multi-threaded approach. In this case, there is often one 'master' process overseeing the work of multiple 'slave' worker threads, one for each processor core which are going to do the actual hard work.

The problem of this is mainly due to clueless programming IMO, threading has been here for a long time and there are good reasons to use it: a cleaner design, more scalable, better error recovery and a responsive UI doing I/O work. Ever seen your mouse stutter as your hard drive was being accessed while inside a game? That's the entire process waiting for the harddrive to respond. It's fully locked/stalled/non-responsive until the harddrive actually returns the requested data. This is bad programming practise and its due to laziness and mediocrity.

Since CPU's do not scale in frequency anymore but instead grow by parallel processing force instead of faster serial operation - all software requiring high CPU utilization will need to focus on threading. The best would be a fully threaded design:

- one master process
- one thread responsible for interface/drawing/keyboard/mouse interactions (would allow the user to press ESC during loading a level and quit the game quickly - if desired)
- one thread responsible for I/O work (depends on implementation, might be useful)
- worker threads (one for each processor core, including virtual cores for Intel CPU's with HyperThreading aka SMT) doing the actual calculations

Problem with this approach is that not all processing can be done in parallel. For example, sometimes a process first has to calculate something before it can know what to do next. Or it needs data from the harddrive before it can continue, and it can't cache this because its too large for the RAM. But there are alots of opportunities to add parallel operation if programmers choose to invest time in threading. In the end, they probably have too. You don't want to run Quake 8 on just one of your 256-core CPU.

steckman · Apr 17, 2009

sub mesa :

This is true, and there is a really good reason for this when dealing with a real-time simulation (or game). Unfortunately, a lot of calculations are dependent on others and you can't even start processing item 2 until computing item 1 is finished. In your example above, how can the GPU start rendering the scene until the CPU has calculated what belongs in the scene?
Programmers have started to spin off separate threads for certain things: audio, I/O, etc, but most of the load in games is related to the main game loop which nobody has figured out a good way to multithread.

sub mesa · Apr 17, 2009

Well i'm not to claim to have all the answers, but i could imagine adding parallel operation by drawing frames ahead. Now its like CPU works on frame1, GPU works on frame1, CPU works on frame2, etc. But if you change the architecture of the game, i can imagine calculating frames 1, 2, 3 and 4 on each CPU core (assume quadcore) each in its own thread. Hardware limitations and API limitations may come into play with communicating with the GPU. But they *have* to sort out this mess now that "everydays" computer is using SMP.

And we've seen that CPU makers can't raise clock frequencies easily, best they can do is improve IPC and go for parallel operation and process miniaturization. So single-threaded games will have to die out in order to take advantage of all the processing power modern CPUs have.

steckman · Apr 17, 2009

The problem is that you can't start working on frame2 while you are still working on frame1. You can't start working on frame2 because it is in the future! How do I know if the bullet hit the guy in frame2? What if he moved? The CPU hasn't calculated the physics of the world for frame2 so maybe an explosion is going to go off at the end of frame1! If you are going to work on frames 1-4 in parallel, you are essentially saying that the world state is completely predictable between those frames. And the only way to be completely predictable is by saying there can be no human input during those frames. So if you are rendering at 30fps, you are now only sampling at 1/4 that. How much difference does that make? If you are driving a NASCAR car at 200 mph, 1/4 sampling is about the equivalence of drunk driving!

steckman · Apr 17, 2009

By the way, I'm totally with you that we need to figure this out. I'm just pointing out that lazy programming is not the problem. The problem is that nobody has come up with a cost-effective solution.

steckman · Apr 17, 2009

Oh, sub mesa, there might be a way to attempt to calculate future frames. You could start calculating frame 1 and do something like guess at the world state for calculating frame 2. When frame 1 is done calculating, if the world states matches what you started with for frame 2, then congratulations, you predicted the future. If the world states don't match, you have to start from scratch calculating frame 2. You could get more sophisticated by breaking the game up into multiple pieces, and then you only have to re-calculate the pieces you missed. For instance, the particle system might be fairly predictable. If this sounds somewhat familiar, I modeled this idea off of predictive branching that some CPUs do (which is to say they begin to evaluate a branch even before they know that branch has been taken). The biggest problem I see with this approach is that it can lead to wildly varying frame rates.

hundredislandsboy · Apr 17, 2009

Why not have core 2 render the frame to go straight forward in the scene (in case your FPS hero goes in that direction) and then have core 3 render in case he turns left and then 4 if he goes right? And how many threads do you want for each core?
I'm obviously not a programmer but come on we've come this far in the hardware but the software guys can't keep up? I guess that's why they're "soft"ware guys.

sub mesa · Apr 17, 2009

Prediction is common in games like World of Warcraft, so guessing info for just 4 frames ahead wont lead to any bad mis-prediction issues. It could become an issue as the number of threads increase though, but at least 4 should be okay. In the event of mis-prediction the events could be delayed (put in future frames) or quickly corrected in the next frames. The last thing can be seen well in World of Warcraft for users with slow connections (high latencies), they would walk and skip left and right much faster than they actually could, just because your local client's prediction is wrong with what the user is actually doing; but your client won't know that for a second later or so, maybe even 2.

steckman · Apr 18, 2009

I think you are confusing a couple of issues here. Prediction is common in games like World of Warcraft because they do what is called co-simulation. The key here is that the server is authoritative (and gets information sooner than the client) and then needs to correct the state of the client. But based on your own example, how would you like it if your game corrected you a second later or maybe even 2? Say you are playing an FPS game and you duck behind a wall and then you die 2 seconds later because your machine determines that you actually got shot 2 seconds ago while you were standing out in the open? Doesn't sound so fun does it?
Anyway, I can go on for pages and pages about this issue because I have a lot of engineering experience with it, but my original point that nobody has found a good way to deal with it still stands.
And if you are wondering about my experience, here you go:
co-authored a 4X space MMO
online programming for 2 AAA PS2 titles
created multiple real-time government simulations
currently work at a Virtual World company

steckman · Apr 18, 2009

hundredislandsboy :

That might be a good solution if it was as simple as "he turns left" and "he turns right", but unfortunately it is more like "he turns .05 degrees to the right", "he turns .06 degrees to the right", etc, etc. If you want to estimate the combination of controller inputs from devices as simple as a mouse and a keyboard, it is insane! I mean imagine a simple game where all you can do is type capital letters. If you want to predict each outcome from a single input, you need 26 cores to deal with each letter of the alphabet. Now let's allow the shift key. Whoops, now we have 52 cores! See how that gets complicated real fast?

JAYDEEJOHN · Apr 18, 2009

Like I said, thingsll have to change radically. Its not only games thatll have these problems, but is mostly our main concern here. Im hoping as we see parallelism coming into cpus, or a cpu/gpu blending, that some of these problems can be incorporated as solutions using such HW

amdfangirl · Apr 18, 2009

Well it is getting better. Better than what it used to be.

JAYDEEJOHN · Apr 18, 2009

Better from now on, true. But theres already as wall. Currently the SW limatations wi;; show this, as HW trudges on. A possible HW help/solution would be the fusion desgn, where wed see more speed by using gpus along with cpus for faster rendering/problem solving. Tho this has limits as well, its just that its so new, theres still alot of potential for it.
Maybe by the time the "fusion" solution has played itself out, well see a whole new approach, whether it be in design, materials or SW, or all 3

sub mesa · Apr 18, 2009

steckman :

The server knows real-time what is happening, the latency between a user pressing a button and it happening on the World of Warcraft servers is at the cost of the user: until the server processes and sends back acknowledgement, the action didn't happen yet. This can be clearly experienced with World of Warcraft on a slow connection as you press one of the instant damage spells or basically any spell, though movement might be less restrictive and allow for some client corrections on the server.

But based on your own example, how would you like it if your game corrected you a second later or maybe even 2?

2 second latency is no go ofcourse. But 4 frames out of 100 frames calculated per second means only 4/100 = 0.04 seconds or 40ms latency; which is considered very good as long as it remains constant it should provide the same experience as real-time gaming.

Because you now have 4 times the CPU power, it can also be assumed that framerates using such parallellisation will boom to well over 100 fps. Ofcourse the question is if the GPU can keep up, but you'd be having alot of CPU power to do work on a quadcore and soon 6 and 8-core CPUs. Should you only get 25fps then 4 frames is too much, but having 25fps on average is piss poor.

Another benefit of this approach could be that there is less fluctuation in the framerates. Since you're rendering ahead temporary slowdowns because some frames are harder than others, are less likely to occur; assuming the GPU can keep up ofcourse. And it could even re-calculate a frame should prediction be wrong for important things, like when you actually died before reaching frame 4

. I agree this idea is radically different than what games do now, and might turn out to be problematic depending on some problems arising. Perhaps people are not thinking out-of-the-box enough, i sure don't hear alot about people trying... that's my point.

Core Scaling on Far Cry 2.

Expert

Administrator

Expert

Distinguished

Distinguished

Champion

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Champion

Expert

Champion

Distinguished

Share this page