News Take control of your Intel CPU's P-Cores and E-Cores with CoreDirector software

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Maybe APO will wake up both game developers and Microsoft, thanks to showing them just how much performance is being left on the table. It's pretty easy to be complacent when you're ignorant. Harder, when you know the true cost of the status quo.
Everybody knows how much more performance a 13900 or a 7950 has over the ps4 or ps5, they don't need APO to tell them that.
Devs don't care enough because they have guaranteed sales of originals at full price on the consoles while sales on PC are low due to piracy and often heavily discounted if originals.
I disagree. I think if you make it easier and less error-prone than affinities and thread priorities, then we might see greater uptake - especially if there's strong data showing the benefits of doing so.
So you don't disagree...
You just think that an API is going to be magic and is going to make these things easy, look at multi-gpu again, that was an API or at least part of one that would increase performance by whatever the iGPU or a second GPU could provide and nobody did anything with it because it would just be too much work.
I meant the individual chunks of computation needed to generate a frame. In order to have a frame you can display, all of the steps needed to compute it must complete.
But that is a never ending loop in itself that often pre renders multiple frames, also it is using all available threads, or a predetermined amount of threads, at full speed even if the game then discards most of the work done and only displays x amount of those generated frames.
It's not like it's one thread doing one chunk of work and then terminating to make room for a different thread.

Another reason why APO decreases power while also increasing performance, at least possibly since we don't know what it does.
 
Yes, that's why game devs completely ignore all that stuff on windows and only do it for one console because there all consoles are the same, and we get stuck with the console config on drastically different hardware.
care to post a link that shows this ?

Devs don't care enough because they have guaranteed sales of originals at full price on the consoles while sales on PC are low due to piracy and often heavily discounted if originals.
at least until the player has finished the game, and trades it in, and its put on the used console game market, then the dev gets squat for it. as for the pc market, if they wouldnt release crap games, then maybe gamers would be more inclined to buy them. cyberpunk showed that, as well as a few of ther other games that came out reciently..

You just think that an API is going to be magic and is going to make these things easy, look at multi-gpu again, that was an API or at least part of one that would increase performance by whatever the iGPU or a second GPU could provide and nobody did anything with it because it would just be too much work.
it doesnt appear you are suggesting anything new, just sticking with what is out there already. which just passed the buck, putting the blame else where. as for multi gpu, the same thing can be said about sli and crossfire.. but that died due to nvidia screwing that up. whats the point of programing for something when its out of reach for everyone that cant afford a xx90 or xx80 class video card? instead of blaming devs for what ever reason that you can think of, blame nvidia for that,its 100% their fault for pricing any multi gpu setup out of reach for 90% of those that may have used it... imo, the best version of multi gpu, was 3dfx's version of sli. ad a 2md voodoo 2, amd it just worked, no need for profiles or any of the other bs nvidia did with their version of sli...
 
Everybody knows how much more performance a 13900 or a 7950 has over the ps4 or ps5, they don't need APO to tell them that.
The key thing APO shows us is how much better it works than running a Gen 14 CPU stock, or with all the other tricks and hacks people usually try.

Devs don't care enough because they have guaranteed sales of originals at full price on the consoles while sales on PC are low due to piracy and often heavily discounted if originals.
If there were really so little interest in PC gaming, then why even release it on the PC? It's not free - there are development, marketing and support costs.

Anyway, you're right that any solution requiring action from developers is contingent on them caring. If they don't, then we're stuck with band aid fixes like APO, which doesn't seem to scale at all.

So you don't disagree...
You just think that an API is going to be magic and is going to make these things easy, look at multi-gpu again, that was an API or at least part of one that would increase performance by whatever the iGPU or a second GPU could provide and nobody did anything with it because it would just be too much work.
All APIs are not equal. Multi-GPU presents numerous, tricky issues and there's only so much that an API can do to solve them for you. I can't comment specifically on multi-GPU, since I'm not familiar with the APIs you're talking about for doing it, but I can say that it's a very different problem and I think the analogy doesn't apply.

What I said I disagreed with was your assertion that devs won't adopt another threading API if they're not using all the features of existing ones. I still maintain my position, in spite of your attempt to tell me I don't (not appreciated).

But that is a never ending loop in itself that often pre renders multiple frames, also it is using all available threads, or a predetermined amount of threads, at full speed even if the game then discards most of the work done and only displays x amount of those generated frames.
I understand the problem, thank you. If you look at it at the granularity of the frame, then it's as I said. They employ multiple cores mostly by dividing up the work needed to generate it, and distributing that among the cores.

As for pipelining to overlap generation of multiple frames, I did actually consider that. I think there's a way to manage it, but essentially you want to ensure the work needed for the earlier frame takes precedence over later ones.

It's not like it's one thread doing one chunk of work and then terminating to make room for a different thread.
I get that, but also a thread isn't the perfect abstraction for what we're talking about. Part of the problem is the mismatch between disparate work chunks and threads. The OS understand threads, but if the app is using them to perform multiple unrelated tasks, then it's difficult for the OS to optimally schedule them.
 
it doesnt appear you are suggesting anything new, just sticking with what is out there already. which just passed the buck, putting the blame else where.
I'm sure Terry will correct me if I'm wrong, but I think our friend would like us to throw all our weight behind APO. That's difficult to do, when APO addresses only 3 CPU models and 2 titles. I think it needs to scale better than that, to be a viable option.

as for multi gpu, ... whats the point of programing for something when its out of reach for everyone that cant afford a xx90 or xx80 class video card?
Yeah, with GPUs getting so expensive and power-hungry, the proposition of multi-GPU gaming seems less enticing.

blame nvidia for that,its 100% their fault for pricing any multi gpu setup out of reach for 90% of those that may have used it...
Oh, it's even worse than that. They actually started impairing their over-the-top NVLink, in recent generations. The version in the RTX 3090 is broken in certain ways. So, the writing was on the wall before they removed it from RTX 4000. I think it's clear they saw it as a threat to their workstation & datacenter GPU product lines.
 
Well until/unless Intel expands APO the two titles I mentioned are the only ones with it so they're the only ones worth testing. It's entirely possible the limited title support in APO is because they're only putting in things that benefit and haven't found a whole lot yet.

In HUB's R6 testing they weren't able to match the APO performance by disabling HT and/or e-cores despite them getting higher performance than stock. That's why I'd like to see a direct comparison between stock/third party software/APO to see what sort of advantage Intel is or isn't bringing to the table.
I just got around to doing a CPU bound comparison between having those 3/4 e-cores per bank disabled vs having them all enabled in Exodus. In the canned benchmark my pc averaged 209 fps when it had 3/4 e-cores disabled and 186fps with all e-cores enabled. (ran at 720p med with DLSS, VRS, tessellation, hairworks, PhysX off, rt normal, reflections hybrid)

The 1/4 e-cores enabled bios is the same as the all e-cores bios, except the 12 e-cores disabled. Same ram settings, same clocks. Using the latest bios that enables APO for 14th gen. 13900kf, 64GB 6200c30+tightened, 3080.

It appears copying what APO is doing works quite well. Better than I could do with a traditional overclock of this CPU in this game for sure. Whether this core config, an all core config, disabling HT or e-cores works best probably depends on the game. Some games like rebar and some like it better without.

Too many options. There's no way I'm going to be testing all of these for every game. But if somebody had a list I might switch bios presets. If there were a program like APO that worked for 13th gen and would automatically switch my core config if I gave it permission I would likely use it.

I don't think Windows is going tune my core usage per app. Weren't we supposed to have direct storage and all sorts of other performance enhancing goodies? And while I appreciate Process Lasso, with my tendency to stick to the same game for a while it is about the same as just switching bios profiles. My light CPU use outside of gaming isn't appreciably slowed by disabling lesser threads so it doesn't bother me to not have the perfect core assignments for each program in the foreground. With either choosing cores and threads by bios settings or assigning them per program in PL I still have to figure out which programs are worth changing settings and to what.

That's a lot of work.

But you know what isn't? I just found out I could press the xbox button on my controller and it brings up the last few games I've played and I can start which one I choose. So far the Game pass, Steam and GOG games I've tried all work with this. Score 1 for being lazy.
 
  • Like
Reactions: Tac 25 and bit_user
I just got around to doing a CPU bound comparison between having those 3/4 e-cores per bank disabled vs having them all enabled in Exodus. In the canned benchmark my pc averaged 209 fps when it had 3/4 e-cores disabled and 186fps with all e-cores enabled.
What result did you get with all E-cores disabled? Also, did you check to ensure the results were repeatable?
 
care to post a link that shows this ?
One that shows it I could, one that proves it I couldn't.
Ain't nobody gonna confess of doing that.
at least until the player has finished the game, and trades it in, and its put on the used console game market, then the dev gets squat for it. as for the pc market, if they wouldnt release crap games, then maybe gamers would be more inclined to buy them. cyberpunk showed that, as well as a few of ther other games that came out reciently..
Good thing then that no console is going digital only...
it doesnt appear you are suggesting anything new, just sticking with what is out there already. which just passed the buck, putting the blame else where. as for multi gpu, the same thing can be said about sli and crossfire.. but that died due to nvidia screwing that up. whats the point of programing for something when its out of reach for everyone that cant afford a xx90 or xx80 class video card? instead of blaming devs for what ever reason that you can think of, blame nvidia for that,its 100% their fault for pricing any multi gpu setup out of reach for 90% of those that may have used it... imo, the best version of multi gpu, was 3dfx's version of sli. ad a 2md voodoo 2, amd it just worked, no need for profiles or any of the other bs nvidia did with their version of sli...
Huh?! Multi-gpu would be using igpus or any gpu you would have in your system.
The key thing APO shows us is how much better it works than running a Gen 14 CPU stock, or with all the other tricks and hacks people usually try.
Yeah, but the devs don't care about that as long as enough of their games are being sold, they don't want to hugely increase their dev cost for a minimum amount of more sales.
Intel on the other hand would care to make the expensive CPUs more alluring to buyers by making them look better.
If there were really so little interest in PC gaming, then why even release it on the PC? It's not free - there are development, marketing and support costs.
There are only dev costs if they do any additional devving..marketing is the same, if they market it for consoles and just slap an also available on pc logo on it it doesn't cost them anything, and support is something they need for consoles anyway so that's a shared expense as well.
So basically, why wouldn't they?! Minimal expenses for possibly decent sales.
All APIs are not equal. Multi-GPU presents numerous, tricky issues and there's only so much that an API can do to solve them for you. I can't comment specifically on multi-GPU, since I'm not familiar with the APIs you're talking about for doing it, but I can say that it's a very different problem and I think the analogy doesn't apply.
So you don't know if it is very different because you don't know about multi-gpu...but you are sure that it is very different...
https://www.anandtech.com/show/9307/the-kaveri-refresh-godavari-review-testing-amds-a10-7870k
Giant image here.
In the screenshot above, the red and blue colored items represent the different items that are rendered and the color shows which graphics in the system supplied the processing power. In this case the APU took care of the red units, while the discrete GPU did the scenery and a good portion of the effects. In the demo I was at, enabling the APU in this circumstance gave a 10% performance increase in a heavy 30 FPS scene to 33 FPS.
What I said I disagreed with was your assertion that devs won't adopt another threading API if they're not using all the features of existing ones. I still maintain my position, in spite of your attempt to tell me I don't (not appreciated).
Devs never used any API that is too much work, they barely use dx12/vulcan and that has been proven to be very effective.
I understand the problem, thank you. If you look at it at the granularity of the frame, then it's as I said. They employ multiple cores mostly by dividing up the work needed to generate it, and distributing that among the cores.
But if you would try to do anything at such a granularity it would make things worse and not better especially if it has to talk with an API in-between...
Also no, they don't, they do the same work on multiple cores and choose the most recent frame at the moment they need to show something on screen.
I get that, but also a thread isn't the perfect abstraction for what we're talking about. Part of the problem is the mismatch between disparate work chunks and threads. The OS understand threads, but if the app is using them to perform multiple unrelated tasks, then it's difficult for the OS to optimally schedule them.
Which is why we got thread director that monitors the threads and picks up on workload changes and suggests changes to the OS, we went over that already.
I'm sure Terry will correct me if I'm wrong, but I think our friend would like us to throw all our weight behind APO. That's difficult to do, when APO addresses only 3 CPU models and 2 titles. I think it needs to scale better than that, to be a viable option.
I don't want devs to use APO, I would want devs to make threading profiles (and general improvements) for at least the most common layouts of CPUs. 4c 6c 6c+4 8c 8c+4 that would be great and would cover everybody, make it a menu choice so that people can pick the best one for any occasion.
Yeah, with GPUs getting so expensive and power-hungry, the proposition of multi-GPU gaming seems less enticing.
If you have an iGPU anyway and it can add even 10% performance "for free" it would still be plenty exiting for the end user.
Just as people have e-cores anyway and would like some extra performance out of them if they can.
 
Last edited:
There are only dev costs if they do any additional devving..marketing is the same, if they market it for consoles and just slap an also available on pc logo on it it doesn't cost them anything, and support is something they need for consoles anyway so that's a shared expense as well.
So basically, why wouldn't they?! Minimal expenses for possibly decent sales.
This all sounds highly speculative and not consistent with what I've heard from an industry contact. Can you provide any source to back up these claims?

In particular, Playstation games would definitely have to be ported to PC, since its OS and graphics APIs are neither Windows nor Direct3D.

So you don't know if it is very different because you don't know about multi-gpu...but you are sure that it is very different...
That's not what I said. I know a decent amount about GPU programming and the challenges involved in multi-GPU rendering. What I'm not familiar with are the specific APIs you're referencing, for doing multi-GPU rendering.

That article has no meaningful degree of detail in regards to what aspects of multi-GPU rendering the API does or doesn't simplify for you. All it says is that the API tells you the speed and capabilities of the GPU. Furthermore, it says "implementing it into their engine for the game took some time", indicating the effort was non-trivial.

Moreover, the example they cite is one where the rendering is split between an iGPU and dGPU. That's awfully convenient, because game engines traditionally stage all of the textures and assets in host memory, before sending them to the GPU. So, there's minimal extra copying, in this arrangement. Unfortunately, the iGPU is so much slower that it added only 10%. It would be more telling to know how well it did or didn't scale to multi-dGPU setups.

Anyway, we're off-topic. As I've explained, I don't find it relevant to the discussion. If you want to continue debating multi-GPU rendering, then I suggest you start a thread in the appropriate sub-forum.

Which is why we got thread director that monitors the threads and picks up on workload changes and suggests changes to the OS, we went over that already.
Unfortunately, the ThreadDirector doesn't preempt threads. So, it's only relevant when the OS scheduler runs, and that's likely far less frequent than the rate at which worker threads are processing disparate work items. Plus, you don't want to be shifting around all of the threads between different cores, continually. That results in lots of L1 & L2 cache misses.

Much like APO, ThreadDirector is trying to solve the problem from the wrong end. It would be ideal if programs could explicitly classify their threads through APIs and compile-time analysis. Better yet, if they could classify and submit individual work items, for the OS to execute in a thread already running on the appropriate core.
 
Last edited:
What result did you get with all E-cores disabled? Also, did you check to ensure the results were repeatable?
I made some more bios profiles by only changing the cores that are active and reran the canned bench. I also disabled Process Lasso at windows start because the free version I'm using makes me wait and it was one more thing to do. The core configurations both dropped a bit in fps but remained proportional.
I got 176fps with everything enabled, 205 fps with 1e-core per bank enabled, 192 fps with all e-cores enabled and HT off, 212fps with e-cores of and HT on, and 228 fps with HT off and e-cores off.

I didn't change GPU settings, power settings, clockspeeds or anything and I'm not aware of some very resource hogging process in the background, but I do have several lesser ones as my OS has been installed for a while and it is for everyday use. I did check task manager for a ~3% cpu use before starting the bench.

These results are just for the canned bench and supposedly there is more variance in gameplay.
But still it's disappointing because the no e, no HT wins again and it seems nothing new is going on with this APO other than proper thread use.

No HT, no e doesn't win every game though. It shouldn't win any game if Windows could assign threads properly to the foreground application. It isn't like the fastest threads have gone anywhere or have gotten slower when other threads are enabled, they are just being replaced by slower threads by Windows.

Oh, and here's a picture dump of those benches: Sorry about the links, I don't know how to get pictures to load from imgur to this site.
 
  • Like
Reactions: bit_user
This all sounds highly speculative and not consistent with what I've heard from an industry contact. Can you provide any source to back up these claims?
Have you ever heard of unreal engine??????
It's just a matter of clicking the relevant tickbox for the platform you want.
What platforms can Unreal Engine build?

Unreal Engine 5 enables you to deploy projects to Windows PC, PlayStation 5, PlayStation 4, Xbox Series X, Xbox Series S, Xbox One, Nintendo Switch, macOS, iOS, Android, ARKit, ARCore, OpenXR, SteamVR, Oculus, Linux, and SteamDeck.
It would be ideal if programs could explicitly classify their threads through APIs and compile-time analysis. Better yet, if they could classify and submit individual work items, for the OS to execute in a thread already running on the appropriate core.
Yes, that would be ideal, ideal as in it will never happen because no dev will ever do so much work if the game runs 'well enough' without them doing that.
 
  • Like
Reactions: Order 66
There needs to be an AMD equivalent of this so that the 7950x3d works as intended without the dumb windows scheduler putting workloads on the wrong cores.
Process Lasso should work essentially the same, but you have to pay for all of the features (just like if you ran Intel) and if you don't want to sit through the wait screen on reboot.
But you do have to make a profile for the program you want to favor your chosen cores. That takes some time.
 
  • Like
Reactions: Order 66
I got 176fps with everything enabled, 205 fps with 1e-core per bank enabled, 192 fps with all e-cores enabled and HT off, 212fps with e-cores of and HT on, and 228 fps with HT off and e-cores off.
Thank you, good sir!

So, you got:

HyperthreadingE-coresFPS
onfull176
offfull192
on1/4205
onoff212
offoff228

Correct?

I didn't change GPU settings, power settings, clockspeeds or anything and I'm not aware of some very resource hogging process in the background, but I do have several lesser ones as my OS has been installed for a while and it is for everyday use. I did check task manager for a ~3% cpu use before starting the bench.
Nice attention to detail. Now, did you try re-running any of the configurations, to get an idea how reproducible the results are?

Oh, and here's a picture dump of those benches: Sorry about the links, I don't know how to get pictures to load from imgur to this site.
I can help with that.

jKRHnjm.png

mvvolmc.png

1vJP6fy.png

PZyoTeQ.png

AHK97V9.png

In the imgur web interface, when you mouse over an image, you'll see a "..." menu. In there, you'll see an option "Get share links" and one of those is "BBCode (forums)" - that's the one you want.
 
Thank you, good sir!

So, you got:
HyperthreadingE-coresFPS
onfull176
offfull192
on1/4205
onoff212
offoff228


Correct?


Nice attention to detail. Now, did you try re-running any of the configurations, to get an idea how reproducible the results are?


I can help with that.

jKRHnjm.png

mvvolmc.png

1vJP6fy.png

PZyoTeQ.png

AHK97V9.png

In the imgur web interface, when you mouse over an image, you'll see a "..." menu. In there, you'll see an option "Get share links" and one of those is "BBCode (forums)" - that's the one you want.
You are correct in the configurations. The task manager also shows cores/threads.
The benchmark also shows the date and time it was run.
Also
lIofEb8.png

Kqy90ky.png

9ejVzgD.png

Here's three more runs. The 8c8t result dropped 4 fps, I reran it and it gained back 2, so there is some variance going on, but the rankings are definitely holding.
I haven't see as much variance in most other games, but since it is time consuming and I game at 4k, I haven't spent a ton of time on it. I do remember pushing games on to p-cores with Process Lasso with the e-cores still active giving me the most efficient results. And since Windows is apparently going to be working more closely with Intel concerning core scheduling starting with Lunar Lake I think that the first several hybrid gens may just be neglected in this regard.

And thanks for showing me the better way to insert images.
 
Here's three more runs. The 8c8t result dropped 4 fps, I reran it and it gained back 2, so there is some variance going on, but the rankings are definitely holding.
I haven't see as much variance in most other games, but since it is time consuming and I game at 4k, I haven't spent a ton of time on it. I do remember pushing games on to p-cores with Process Lasso with the e-cores still active giving me the most efficient results. And since Windows is apparently going to be working more closely with Intel concerning core scheduling starting with Lunar Lake I think that the first several hybrid gens may just be neglected in this regard.

And thanks for showing me the better way to insert images.
Thank you for advancing our understanding of hybrid CPUs vs. game performance!
: )
 
Status
Not open for further replies.