Alternative Title: Why single core performance still matters
A question popped up in my head recently: does Windows have some tool or utility to monitor CPU time of an applications threads? The primary reason to ask this question and find out if there's a way to see a trend in games on their thread utilization. Some of us here claim that despite consumers having access to 8+ cores at a relatively affordable price, games still rely on a small number of threads and hence, all those cores really don't matter and what matters is the single core performance. In other words, games haven't been "properly" multithreaded. Of course, rather than just make that claim, why not put, at least, my money where my mouth is?
One concern is this sort of deep dive tends to be limited to development environments, but Windows does keep tabs on threads so maybe there was hope. And there was! Process Explorer can show CPU time per thread if you double click on the application and select the "Threads" tab. Though the metric of importance is "Cycles Delta", but it's not enabled by default. Why is this important? Because it tells you how many cycles the thread was on the CPU since the last report. If this value floats around a specific range, it's indicative of how busy it tends to be over time. There's also another way through Performance Monitor, but setting that up is a pain.
So I ran a few games to see how their per-thread usage looks like. Most of them I just let it sit there so this isn't indicative of say when things get busy, but I think it's useful to provide a baseline at least. The games I ran:
This is going to be a common thing: despite there being a ton of threads spawned, a good number of them are sitting there doing nothing. In any case, you can see that three threads dominate the CPU time. The most I can gather what "tier0.dll" is is it's tied to Source games, which I'm guessing because of how Source games actually work, is a server that's running the game logic. The other two represent the main game executable and the NVIDIA driver.
Cities Skylines has been claimed by others that it relies heavily on a single thread to do everything. And this pretty much proves it. The map I used was https://steamcommunity.com/workshop/filedetails/?id=785933283 , which is a highly populated map. As a point of comparison, here's the usage from a map that barely has anyone in it.
For this one, about 5 threads dominate the CPU time, with one of them showing a higher usage than the rest. Unfortunately the "Start Address" doesn't seem to point to anything useful, so it's hard to tell what's what.
So this is where things get interesting. Unreal Engine 4 games (the other game I tried also does this) seem to spawn a ton of threads that do a lot of work. However, it's also clear that there's still one thread that dominates the entire game.
It's funny that the driver thread completely over takes the main game thread. I figured this was going to happen, but I thought it'd be interesting to throw it in anyway
Two threads still dominate this game, but there seems to be quite a handful that gets a non-trivial amount of work.
I ran this with the -dx12 option on. But similar to Outer Worlds, a ton of threads got spawned and they seem to be doing something. But in the end, two threads dominate the game
Link to the album: View: https://imgur.com/a/e5SszIt
Conclusion
So basically, all of these games have typically one or two threads that tend to be busy all the time, so much more than the other threads they spawn. And note just because there are other threads the game spawned doesn't mean they actually are ready to run. You can't look at one of the Unreal Engine 4 games and go "look! it spawned 30+ threads, clearly this should run great on an Threadripper!", because those threads aren't running all at the same time. And depending on the order of how these threads run, it's conceivable that you can get similar performance on a CPU with fewer cores/threads because you could just run them back to back, taking the same amount of time as one of the threads that took a lot more CPU time.
In any case, this is why single threaded performance in games is still important. Games are still designed in a way that there's not a whole lot of work to do at once, and things tend to be shoved into a single thread.
A question popped up in my head recently: does Windows have some tool or utility to monitor CPU time of an applications threads? The primary reason to ask this question and find out if there's a way to see a trend in games on their thread utilization. Some of us here claim that despite consumers having access to 8+ cores at a relatively affordable price, games still rely on a small number of threads and hence, all those cores really don't matter and what matters is the single core performance. In other words, games haven't been "properly" multithreaded. Of course, rather than just make that claim, why not put, at least, my money where my mouth is?
One concern is this sort of deep dive tends to be limited to development environments, but Windows does keep tabs on threads so maybe there was hope. And there was! Process Explorer can show CPU time per thread if you double click on the application and select the "Threads" tab. Though the metric of importance is "Cycles Delta", but it's not enabled by default. Why is this important? Because it tells you how many cycles the thread was on the CPU since the last report. If this value floats around a specific range, it's indicative of how busy it tends to be over time. There's also another way through Performance Monitor, but setting that up is a pain.
So I ran a few games to see how their per-thread usage looks like. Most of them I just let it sit there so this isn't indicative of say when things get busy, but I think it's useful to provide a baseline at least. The games I ran:
- Black Mesa Source: I just wanted a Source Engine game and this was installed
- Cities Skyline: A popular simulation game, so it'd be interesting to see its pre-thread usage
- Call of Duty: Modern Warfare 2 (2022). Spent a few minutes in one of the Single Player levels
- Cyberpunk 2077: Just to throw an open world modern game in there
- F1 2019: Another simulation game
- Final Fantasy XIV: An MMORPG
- Outer Worlds: I wanted an Unreal Engine 4 game, and this is one I happened to have installed
- Quake 2 RTX: It's built off the original Quake 2 game, but otherwise this is to see what a really old game look like
- Resident Evil 4 Chainsaw Demo: Also just wanted to throw some semblance of a modern game
- Stray: Also another Unreal Engine 4 game
This is going to be a common thing: despite there being a ton of threads spawned, a good number of them are sitting there doing nothing. In any case, you can see that three threads dominate the CPU time. The most I can gather what "tier0.dll" is is it's tied to Source games, which I'm guessing because of how Source games actually work, is a server that's running the game logic. The other two represent the main game executable and the NVIDIA driver.
Cities Skylines has been claimed by others that it relies heavily on a single thread to do everything. And this pretty much proves it. The map I used was https://steamcommunity.com/workshop/filedetails/?id=785933283 , which is a highly populated map. As a point of comparison, here's the usage from a map that barely has anyone in it.
For this one, about 5 threads dominate the CPU time, with one of them showing a higher usage than the rest. Unfortunately the "Start Address" doesn't seem to point to anything useful, so it's hard to tell what's what.
So in this one I wanted to try something different. Run the game at RT Ultra settings at 1440p, then drop it down to Low quality settings at 720p and see the difference.
So here's the usage at 720p with low quality settings
And here it is as 1440p RT Ultra quality
So the one interesting to note is that 11 threads dropped in their CPU time. Given that Cyberpunk 2077 is a DX12 game, it's possible these are threads for rendering graphics.
Either way, this illustrates why lowering the resolution puts more strain on the CPU.
So here's the usage at 720p with low quality settings
And here it is as 1440p RT Ultra quality
So the one interesting to note is that 11 threads dropped in their CPU time. Given that Cyberpunk 2077 is a DX12 game, it's possible these are threads for rendering graphics.
Either way, this illustrates why lowering the resolution puts more strain on the CPU.
I took this while running the benchmark, which simulates a race.
The interesting thing is there's 8 threads running fairly evenly. This makes me think these are for the AI racers (even though there's 20 total racers or so). I also ran this in DX11, which would explain the busy driver thread
The interesting thing is there's 8 threads running fairly evenly. This makes me think these are for the AI racers (even though there's 20 total racers or so). I also ran this in DX11, which would explain the busy driver thread
I took readings from several scenarios this time:
720p High Desktop quality, non busy area
1440p High Desktop quality, non busy area
1440p High desktop quality with a 120FPS cap, non-busy area
1440p High Desktop, Limsa Lominsa Aetheryte plaza (one of the busiest areas in game)
Strangely enough, the game's main thread increases in activity at lower resolution while the driver activity decreases. Either way, these two dominate the CPU usage. The third one is likely the network handler or something related to it since the main thread didn't jump up.
720p High Desktop quality, non busy area
1440p High Desktop quality, non busy area
1440p High desktop quality with a 120FPS cap, non-busy area
1440p High Desktop, Limsa Lominsa Aetheryte plaza (one of the busiest areas in game)
Strangely enough, the game's main thread increases in activity at lower resolution while the driver activity decreases. Either way, these two dominate the CPU usage. The third one is likely the network handler or something related to it since the main thread didn't jump up.
So this is where things get interesting. Unreal Engine 4 games (the other game I tried also does this) seem to spawn a ton of threads that do a lot of work. However, it's also clear that there's still one thread that dominates the entire game.
It's funny that the driver thread completely over takes the main game thread. I figured this was going to happen, but I thought it'd be interesting to throw it in anyway
Two threads still dominate this game, but there seems to be quite a handful that gets a non-trivial amount of work.
I ran this with the -dx12 option on. But similar to Outer Worlds, a ton of threads got spawned and they seem to be doing something. But in the end, two threads dominate the game
Link to the album: View: https://imgur.com/a/e5SszIt
Conclusion
So basically, all of these games have typically one or two threads that tend to be busy all the time, so much more than the other threads they spawn. And note just because there are other threads the game spawned doesn't mean they actually are ready to run. You can't look at one of the Unreal Engine 4 games and go "look! it spawned 30+ threads, clearly this should run great on an Threadripper!", because those threads aren't running all at the same time. And depending on the order of how these threads run, it's conceivable that you can get similar performance on a CPU with fewer cores/threads because you could just run them back to back, taking the same amount of time as one of the threads that took a lot more CPU time.
In any case, this is why single threaded performance in games is still important. Games are still designed in a way that there's not a whole lot of work to do at once, and things tend to be shoved into a single thread.