Core Scaling on Far Cry 2.

r_manic

Administrator
But how come I always read stuff saying software (even games) don't take full advantage of multi-core CPUs, etc etc? I think we should hold off on considering them mainstream until mainstream OSes are actually built to use 'em.
 

d_kuhn

Distinguished
Mar 26, 2002
704
0
18,990
Coding for multiple processors is a HUGE paradigm shift for game coders. I've been doing it for over a decade, but these guys have been concentrating on optimizing linear coding and now everyone is expecting them to leverage 2, 4 or more processors. It'll take time for these guys to build understanding of how best to accomplish that.
 

hundredislandsboy

Distinguished
Many mainstream PC enthusiasts such as myself also have other graphics-related hobbies besides Far Cry 2 and Crysis. I love the Spectate Mode in Crysis just to see the beautifully rendered textures from all angles.
I also do much photo and video editing beginning with the days of the Celeron and Pentium !!'s. Believe me these mainstream apps are coded to use as many cores as possible and cuts the render times of editing 10 MB photo file and 30 minutes of a DVD video to a quarter or even more than they were just a few years ago!
 
I find it interesting alot of people will claim that games are gpu limited. This shows the limitations we are seeing in MT, if used at all, cpus, and the wall theyve hit in IPC/clock speeds Im thinking things may some day change radically, since in order to truly take advantage of MT, games wont be the best example, and alot of apps wont comply easily either
 

sub mesa

Distinguished

Yes up until now all games were like:

While ( no error)
{
do cpu work
do gpu work
} // repeat as fast as we can

The result of this single-threaded "as fast we can" behaviour is always 100% CPU load on one core, and no usage on additional cores. Operating systems may switch the load to another CPU core and do so frequently, but never can two cores be doing work at the same time so there is no performance improvement from using dualcore or quadcore over a single core system.

As more modern server processes do often, is the multi-threaded approach. In this case, there is often one 'master' process overseeing the work of multiple 'slave' worker threads, one for each processor core which are going to do the actual hard work.

The problem of this is mainly due to clueless programming IMO, threading has been here for a long time and there are good reasons to use it: a cleaner design, more scalable, better error recovery and a responsive UI doing I/O work. Ever seen your mouse stutter as your hard drive was being accessed while inside a game? That's the entire process waiting for the harddrive to respond. It's fully locked/stalled/non-responsive until the harddrive actually returns the requested data. This is bad programming practise and its due to laziness and mediocrity.

Since CPU's do not scale in frequency anymore but instead grow by parallel processing force instead of faster serial operation - all software requiring high CPU utilization will need to focus on threading. The best would be a fully threaded design:

- one master process
- one thread responsible for interface/drawing/keyboard/mouse interactions (would allow the user to press ESC during loading a level and quit the game quickly - if desired)
- one thread responsible for I/O work (depends on implementation, might be useful)
- worker threads (one for each processor core, including virtual cores for Intel CPU's with HyperThreading aka SMT) doing the actual calculations

Problem with this approach is that not all processing can be done in parallel. For example, sometimes a process first has to calculate something before it can know what to do next. Or it needs data from the harddrive before it can continue, and it can't cache this because its too large for the RAM. But there are alots of opportunities to add parallel operation if programmers choose to invest time in threading. In the end, they probably have too. You don't want to run Quake 8 on just one of your 256-core CPU. :D
 

steckman

Distinguished
Apr 21, 2006
234
0
18,680


This is true, and there is a really good reason for this when dealing with a real-time simulation (or game). Unfortunately, a lot of calculations are dependent on others and you can't even start processing item 2 until computing item 1 is finished. In your example above, how can the GPU start rendering the scene until the CPU has calculated what belongs in the scene?
Programmers have started to spin off separate threads for certain things: audio, I/O, etc, but most of the load in games is related to the main game loop which nobody has figured out a good way to multithread.


 

sub mesa

Distinguished
Well i'm not to claim to have all the answers, but i could imagine adding parallel operation by drawing frames ahead. Now its like CPU works on frame1, GPU works on frame1, CPU works on frame2, etc. But if you change the architecture of the game, i can imagine calculating frames 1, 2, 3 and 4 on each CPU core (assume quadcore) each in its own thread. Hardware limitations and API limitations may come into play with communicating with the GPU. But they *have* to sort out this mess now that "everydays" computer is using SMP.

And we've seen that CPU makers can't raise clock frequencies easily, best they can do is improve IPC and go for parallel operation and process miniaturization. So single-threaded games will have to die out in order to take advantage of all the processing power modern CPUs have.
 

steckman

Distinguished
Apr 21, 2006
234
0
18,680
The problem is that you can't start working on frame2 while you are still working on frame1. You can't start working on frame2 because it is in the future! How do I know if the bullet hit the guy in frame2? What if he moved? The CPU hasn't calculated the physics of the world for frame2 so maybe an explosion is going to go off at the end of frame1! If you are going to work on frames 1-4 in parallel, you are essentially saying that the world state is completely predictable between those frames. And the only way to be completely predictable is by saying there can be no human input during those frames. So if you are rendering at 30fps, you are now only sampling at 1/4 that. How much difference does that make? If you are driving a NASCAR car at 200 mph, 1/4 sampling is about the equivalence of drunk driving!
 

steckman

Distinguished
Apr 21, 2006
234
0
18,680
By the way, I'm totally with you that we need to figure this out. I'm just pointing out that lazy programming is not the problem. The problem is that nobody has come up with a cost-effective solution.
 

steckman

Distinguished
Apr 21, 2006
234
0
18,680
Oh, sub mesa, there might be a way to attempt to calculate future frames. You could start calculating frame 1 and do something like guess at the world state for calculating frame 2. When frame 1 is done calculating, if the world states matches what you started with for frame 2, then congratulations, you predicted the future. If the world states don't match, you have to start from scratch calculating frame 2. You could get more sophisticated by breaking the game up into multiple pieces, and then you only have to re-calculate the pieces you missed. For instance, the particle system might be fairly predictable. If this sounds somewhat familiar, I modeled this idea off of predictive branching that some CPUs do (which is to say they begin to evaluate a branch even before they know that branch has been taken). The biggest problem I see with this approach is that it can lead to wildly varying frame rates.
 

hundredislandsboy

Distinguished
Why not have core 2 render the frame to go straight forward in the scene (in case your FPS hero goes in that direction) and then have core 3 render in case he turns left and then 4 if he goes right? And how many threads do you want for each core?
I'm obviously not a programmer but come on we've come this far in the hardware but the software guys can't keep up? I guess that's why they're "soft"ware guys.
 

sub mesa

Distinguished
Prediction is common in games like World of Warcraft, so guessing info for just 4 frames ahead wont lead to any bad mis-prediction issues. It could become an issue as the number of threads increase though, but at least 4 should be okay. In the event of mis-prediction the events could be delayed (put in future frames) or quickly corrected in the next frames. The last thing can be seen well in World of Warcraft for users with slow connections (high latencies), they would walk and skip left and right much faster than they actually could, just because your local client's prediction is wrong with what the user is actually doing; but your client won't know that for a second later or so, maybe even 2.
 

steckman

Distinguished
Apr 21, 2006
234
0
18,680
I think you are confusing a couple of issues here. Prediction is common in games like World of Warcraft because they do what is called co-simulation. The key here is that the server is authoritative (and gets information sooner than the client) and then needs to correct the state of the client. But based on your own example, how would you like it if your game corrected you a second later or maybe even 2? Say you are playing an FPS game and you duck behind a wall and then you die 2 seconds later because your machine determines that you actually got shot 2 seconds ago while you were standing out in the open? Doesn't sound so fun does it?
Anyway, I can go on for pages and pages about this issue because I have a lot of engineering experience with it, but my original point that nobody has found a good way to deal with it still stands.
And if you are wondering about my experience, here you go:
co-authored a 4X space MMO
online programming for 2 AAA PS2 titles
created multiple real-time government simulations
currently work at a Virtual World company
 

steckman

Distinguished
Apr 21, 2006
234
0
18,680


That might be a good solution if it was as simple as "he turns left" and "he turns right", but unfortunately it is more like "he turns .05 degrees to the right", "he turns .06 degrees to the right", etc, etc. If you want to estimate the combination of controller inputs from devices as simple as a mouse and a keyboard, it is insane! I mean imagine a simple game where all you can do is type capital letters. If you want to predict each outcome from a single input, you need 26 cores to deal with each letter of the alphabet. Now let's allow the shift key. Whoops, now we have 52 cores! See how that gets complicated real fast?
 
Like I said, thingsll have to change radically. Its not only games thatll have these problems, but is mostly our main concern here. Im hoping as we see parallelism coming into cpus, or a cpu/gpu blending, that some of these problems can be incorporated as solutions using such HW
 
Better from now on, true. But theres already as wall. Currently the SW limatations wi;; show this, as HW trudges on. A possible HW help/solution would be the fusion desgn, where wed see more speed by using gpus along with cpus for faster rendering/problem solving. Tho this has limits as well, its just that its so new, theres still alot of potential for it.
Maybe by the time the "fusion" solution has played itself out, well see a whole new approach, whether it be in design, materials or SW, or all 3
 

sub mesa

Distinguished

The server knows real-time what is happening, the latency between a user pressing a button and it happening on the World of Warcraft servers is at the cost of the user: until the server processes and sends back acknowledgement, the action didn't happen yet. This can be clearly experienced with World of Warcraft on a slow connection as you press one of the instant damage spells or basically any spell, though movement might be less restrictive and allow for some client corrections on the server.

But based on your own example, how would you like it if your game corrected you a second later or maybe even 2?
2 second latency is no go ofcourse. But 4 frames out of 100 frames calculated per second means only 4/100 = 0.04 seconds or 40ms latency; which is considered very good as long as it remains constant it should provide the same experience as real-time gaming.

Because you now have 4 times the CPU power, it can also be assumed that framerates using such parallellisation will boom to well over 100 fps. Ofcourse the question is if the GPU can keep up, but you'd be having alot of CPU power to do work on a quadcore and soon 6 and 8-core CPUs. Should you only get 25fps then 4 frames is too much, but having 25fps on average is piss poor.

Another benefit of this approach could be that there is less fluctuation in the framerates. Since you're rendering ahead temporary slowdowns because some frames are harder than others, are less likely to occur; assuming the GPU can keep up ofcourse. And it could even re-calculate a frame should prediction be wrong for important things, like when you actually died before reaching frame 4 :D. I agree this idea is radically different than what games do now, and might turn out to be problematic depending on some problems arising. Perhaps people are not thinking out-of-the-box enough, i sure don't hear alot about people trying... that's my point.