News Cyberpunk 2077 adds core prioritization for hybrid CPUs, which would be great if it didn't cause other problems

Status
Not open for further replies.

usertests

Distinguished
Mar 8, 2013
519
477
19,260
I wonder if it does anything at all for AMD's version of hybrid. For, example, 8500G/7545U/8540U with 2x Zen 4 cores, 4x Zen 4c cores. And more importantly, 4+8 Strix Point later.
 
  • Like
Reactions: Order 66
I wonder if it does anything at all for AMD's version of hybrid. For, example, 8500G/7545U/8540U with 2x Zen 4 cores, 4x Zen 4c cores. And more importantly, 4+8 Strix Point later.

Nope, it won't make any difference nor will it work on AMD's hybrid CPUs either, because the Zen 4 and 4c cores don't behave in the same way like Intel's P and E cores. P and E cores are based on different architectures.

You guys must already be aware by now that ZEN 4c cores instead feature an identical IPC to "Zen 4," as well as an identical ISA. So, that's the same execution and issue width, the same number of registers, the same internal latencies, including multi-threading.

"Zen 4c" CPU core is just a compacted version of the "Zen 4" core without the subtraction of any hardware components, but rather a high density arrangement of them, which are generally clocked lower than "Zen 4" cores, as they can operate at lower core voltages.

BTW, it appears that there's much more asymmetry between Intel's core types, like differences in issue width, latencies and even instruction set. And more importantly, Intel's Efficient cores do not support multi-threading.

So there seems to be some complication to manage thread allocation and load balancing on Intel's hybrid design.
 
Last edited:
  • Like
Reactions: Roland Of Gilead

abufrejoval

Reputable
Jun 19, 2020
346
239
5,060
My understanding of the C cores in the Zen APUs is that they really aim to fill niches that are Wattage constrained anyway. So if only have 15 Watts available, the fact that each non-C core might reach top clocks never really matters, because all clocks will have to go down to what the condensed layout of the C's supports when they have to share meager single digit wattage per core.

Yes, half the cache remains as a factor, but I don't think games will start to optimize for the cache size of each core type and count permutation in the market.

In short: any attempt of a game (or app) to treat Zen 4 C cores differently from Zen 4 W[ide] cores would be counter-productive. It's that simplicity AMD is selling and you'd disprove them if you'd gain noticeable performancne gains.
 
  • Like
Reactions: jlake3
Apr 1, 2020
1,494
1,168
7,060
Intel's hybrid system didn't make sense to begin with since the P and E cores aren't the same (different supported instructions and so forth) and weren't simply just lower performance/power efficient versions of the P cores, and even then it doesn't make much sense besides what can be achieved with just software limiting the speeds and voltages for battery life or uncapping them when connected to AC. We see this in practice where the 13900HX/14900HX (8P+16E) effectively score the same or even fall behind the 7945HX (16P) in multi-core tests despite having more processing cores.

It is Intel's Bulldozer experiment, which failed and hopefully will be soon abandoned.
 

magbarn

Reputable
Dec 9, 2020
123
110
4,770
Intel's hybrid system didn't make sense to begin with since the P and E cores aren't the same (different supported instructions and so forth) and weren't simply just lower performance/power efficient versions of the P cores, and even then it doesn't make much sense besides what can be achieved with just software limiting the speeds and voltages for battery life or uncapping them when connected to AC. We see this in practice where the 13900HX/14900HX (8P+16E) effectively score the same or even fall behind the 7945HX (16P) in multi-core tests despite having more processing cores.

It is Intel's Bulldozer experiment, which failed and hopefully will be soon abandoned.
Intel's going all in with their mobile chips even getting a set of 3rd class CPUs in Meteor Lake. Apple does fine with MacOS with hybrid cores. Is it Microsoft's fault that Windows is horrible at managing a heterogenous core?
 
Jan 31, 2024
6
22
15
As someone who helped work to bring this patch to the game due to my ongoing issues with crashes on Cyberpunk 2077 with an i9-13900k, allow me to explain:

Many, many modern games, including Returnal, Remnant from the Ashes 2, and a slew of other modern games, do not know how to schedule and prioritize the workload effectively across P and E cores. You will get either random crashes or error notifications about a "lack of video memory," even while running a 4090.

These crashes are very unfortunate, and the fix remains constant across all these games, including Cyberpunk 2077: A manual downclock of about 200-300mhz across all P cores done through Intel Extreme Tuning Utility (Intel XTU).

Now, with these fixes in place, this enables users to play the game with the full processing power of their 12th, 13th, and 14th gen processors, without thermal throttling, poor performance, crashes, and without having to run an entire separate program just to downclock their CPUs.

If it introduced stuttering, that is unfortunate, but Cyberpunk deserves credit for being the only game so far to even acknowledge this widespread problem exists, let alone implement an actual solution to fix it. I am sure further tuning and tweaking will be needed to prevent micro stuttering other users are claiming to have since patch 2.11, but simply turning off the Hybrid CPU Utilization option, or leaving it to "auto" should restore behavior and performance to how it was in the prior patch.

If you just do a brief Google search for "i9 13900k out of video memory," you can find many, many such users across MANY modern games with this exact same problem, being resolved in this exact same way.

Again, the Cyberpunk team deserves credit for not only acknowledging this issue exists, but actually implementing a fix, even if it's imperfect.
 
Intel's hybrid system didn't make sense to begin with since the P and E cores aren't the same (different supported instructions and so forth) and weren't simply just lower performance/power efficient versions of the P cores, and even then it doesn't make much sense besides what can be achieved with just software limiting the speeds and voltages for battery life or uncapping them when connected to AC. We see this in practice where the 13900HX/14900HX (8P+16E) effectively score the same or even fall behind the 7945HX (16P) in multi-core tests despite having more processing cores.

It is Intel's Bulldozer experiment, which failed and hopefully will be soon abandoned.
It makes perfect sense if you think like a company for a second, why would you put more than 8 very expensive cores in a CPU if you can instead fill it up with cheapo cores?
When all cores are working they all have to power down and clock lower so why not have your standard 8 cores and the rest can be cheap cores that always perform as if the whole CPU where under full load, because that's the only time they would do any work anyway. (as far as benchmarks are concerned anyway)

That is what AMD discovered now as well, after a few years of watching intel doing it and thought hey, why aren't we doing that as well, and hey presto c versions, because they can't afford a completely new design.
 
When all cores are working they all have to power down and clock lower so why not have your standard 8 cores and the rest can be cheap cores that always perform as if the whole CPU where under full load, because that's the only time they would do any work anyway. (as far as benchmarks are concerned anyway)
Yeah this isn't really the reason we saw hybrid architecture (I do think this was inevitable no matter what though) when we did. Golden Cove uses a lot of power to get high clockspeeds so adding more balloons the power budget if the aim is to be competitive multithreaded performance wise. ADL's tradeoff was made completely to keep the TDP in line while still competing. This has served Intel extremely well given the node disadvantage (moreso with RPL v Zen 4 than ADL v Zen 3) and having to deal with the scheduler side of things.
That is what AMD discovered now as well, after a few years of watching intel doing it and thought hey, why aren't we doing that as well, and hey presto c versions, because they can't afford a completely new design.
This is complete nonsense.

Intel designed their hybrid architecture to make up for node inefficiencies first and foremost. The multithreaded optimizations and power scaling on the cores lend themselves rather well for high core count enterprise workloads so they were adapted for the forthcoming Forest CPUs.

AMD designed their dense architecture to compete with ARM solutions predominantly (they already had Intel beat in core counts, but this advantage stretched it further) and it's a market where boost behavior isn't particularly important. They repurposed the core design for mobile because the limited TDP means there was no real performance loss for the area saved and I imagine we'll see it there again. I do not expect that we will see hybrid desktop chips from them before Zen 6 if even then.
 
Golden Cove uses a lot of power to get high clockspeeds so adding more balloons the power budget if the aim is to be competitive multithreaded performance wise.
Even after the second "overclock" on the same arch, Golden cove still uses less power than even the x3d version of AMDs newest CPUs.
Unless you overclock on-top of the factory overclock.

Your point would be about the power draw of the 12900k which is just 26W compared to the 35-37 of 14th and ryzen4.
https://www.techpowerup.com/review/intel-core-i9-14900k/22.html
power-singlethread.png
 
  • Like
Reactions: Sluggotg
This is complete nonsense.

Intel designed their hybrid architecture to make up for node inefficiencies first and foremost.
The inefficiencies being that it uses less power??
The e-cores are the ones that use more power, damaging the overall efficiency of their CPUs.
Intel does that because they can afford to, it's still cheaper for them to have e-cores then the few sales they might lose to AMD.

You can see here how much less efficient (watt/performance) the e-cores are overall.
 

HyperMatrix

Distinguished
May 23, 2015
119
117
18,760
More important than keeping the game on the P cores is keeping other things off them. My ideal scenario would be to have the OS and everything other than the currently active game/app run exclusively on the e-core with the entirety of the P cores left completely untouched and dedicated to running the game. From experience this would also require hiding the number of cores from games when they’re polling the system. Because once they detect more than 8 cores, preventing access to the E-cores can break things.
 

Amdlova

Distinguished
from the 12700T for the 13500T don't see any issues with moar E cores...
But lots of games don't know how to behave when you have lots of cores...
When my system are a Xeon E5-2696v4 need to use the Process lasso to set What cores I need to play a game. Some games try to allocate 44 treads, With lasso set four core max...
 
If you just do a brief Google search for "i9 13900k out of video memory," you can find many, many such users across MANY modern games with this exact same problem, being resolved in this exact same way.
So now I'm curious, because I get this exact problem in certain games. The thing is, it has nothing to do with VRAM or the GPU! The real problem in my case is that the CPU appears to be either overheating or just getting into a bad state and crashing the running process.

For me, this happens during shader compilation when you first launch a game. I can name quite a few with this issue. Hogwarts Legacy, The Last of Us, Alan Wake 2 (not quite as frequent), Metro Exodus, Horizon Zero Dawn, Watch Dogs Legion (sometimes), and I'm sure there are many others that I've never tried and thus don't have personal experience with.

The solution, in every case I've encountered, is to set the game's affinity to just the P-cores, during shader compilation. Once that's done, reset the game's affinity to all cores. But I can see just looking at the LED POST codes on my motherboard (which show CPU temps after the initial boot process), that the CPU hits 99+ C when affinity isn't set, and anything in the high 90s seems likely to cause a crash.

I'm not entirely sure if it's just with my motherboard, or if it's something with the 13900K. I suspect it's a motherboard setting, like being too aggressive with boosting and voltages or current. And since motherboard BIOS code often gets at least partially shared between a lot of different models from the same vendor, and even across vendors, it could be that lots/most Z790 motherboards are impacted.
 

nightbird321

Distinguished
Sep 15, 2012
30
24
18,535
Can edit game launch shortcut property to set priority and process affinity once and be done with it.

C:\Windows\System32\cmd.exe /c start "You Game" /High /affinity FF "C:\game_launcher.exe"

Replace FF with the hexadecimal core selection for your CPU.
 
Jan 31, 2024
6
22
15
So now I'm curious, because I get this exact problem in certain games. The thing is, it has nothing to do with VRAM or the GPU! The real problem in my case is that the CPU appears to be either overheating or just getting into a bad state and crashing the running process.

For me, this happens during shader compilation when you first launch a game. I can name quite a few with this issue. Hogwarts Legacy, The Last of Us, Alan Wake 2 (not quite as frequent), Metro Exodus, Horizon Zero Dawn, Watch Dogs Legion (sometimes), and I'm sure there are many others that I've never tried and thus don't have personal experience with.

The solution, in every case I've encountered, is to set the game's affinity to just the P-cores, during shader compilation. Once that's done, reset the game's affinity to all cores. But I can see just looking at the LED POST codes on my motherboard (which show CPU temps after the initial boot process), that the CPU hits 99+ C when affinity isn't set, and anything in the high 90s seems likely to cause a crash.

I'm not entirely sure if it's just with my motherboard, or if it's something with the 13900K. I suspect it's a motherboard setting, like being too aggressive with boosting and voltages or current. And since motherboard BIOS code often gets at least partially shared between a lot of different models from the same vendor, and even across vendors, it could be that lots/most Z790 motherboards are impacted.
If you want an actual technical explanation, seemingly, one can be found here:

http://www.radgametools.com/oodleintel.htm

It seems related to Oodle Data decompression failures, Oodle being a data decompression method most if not all modern games, including UE5 games and Cyberpunk 2077 itself, use.

Some motherboards have built in overclocking features enabled by default so that they benchmark better than other motherboards, etc. These should be disabled.

In my testing with the bug-fixing team at CDPR, who sent me several preview builds leading up to the full release of this hybrid CPU utilization feature in 2.11, I would see absurdly high temperatures and usage even during just the intro loading screens and main menu of Cyberpunk. Enabling this new hybrid CPU utilization feature both fixes crashes and leads to performance improvements and a reduction in CPU temps.

At any rate, the i9 13900k and i9 14900k are particularly power-hungry and tend to run hot, so a minor undervolt/downclock is probably prudent anyways, but that should not be the user's responsibility to enable just to have games not crash and the CPU not overheat to ridiculous temperatures, even outside of gameplay.

This does seem to be a problem that particularly affects 13900k and 14900k users, but also has affected some 13700 & 14700 users as well. Again, this problem will only become more pernicious and more widespread as more and more 13900 and 14900k users try to download and play more modern games. Again, Cyberpunk is the only game to address this issue, let alone even implement a fix. They should be given some credit for that even if enabling this feature does introduce microstutters for some users, who can simply disable it to regain normal stable performance.

I hope this was helpful to you in solving your issues.
 
Jan 31, 2024
6
22
15
For me, this happens during shader compilation when you first launch a game. I can name quite a few with this issue. Hogwarts Legacy, The Last of Us, Alan Wake 2 (not quite as frequent), Metro Exodus, Horizon Zero Dawn, Watch Dogs Legion (sometimes), and I'm sure there are many others that I've never tried and thus don't have personal experience with.
Other notable games that I know suffer from this same issue are The Finals, Path Of Exile, Returnal, BF2042, and Remnant From The Ashes 2. Likely, as Oodle is a very widely used and industry standard data decompression tool, the list will only continue to grow longer and longer.
 
Last edited:

TJ Hooker

Titan
Ambassador
My understanding of the C cores in the Zen APUs is that they really aim to fill niches that are Wattage constrained anyway. So if only have 15 Watts available, the fact that each non-C core might reach top clocks never really matters, because all clocks will have to go down to what the condensed layout of the C's supports when they have to share meager single digit wattage per core.

Yes, half the cache remains as a factor, but I don't think games will start to optimize for the cache size of each core type and count permutation in the market.

In short: any attempt of a game (or app) to treat Zen 4 C cores differently from Zen 4 W[ide] cores would be counter-productive. It's that simplicity AMD is selling and you'd disprove them if you'd gain noticeable performancne gains.
Nope, it won't make any difference nor will it work on AMD's hybrid CPUs either, because the Zen 4 and 4c cores don't behave in the same way like Intel's P and E cores. P and E cores are based on different architectures.

You guys must already be aware by now that ZEN 4c cores instead feature an identical IPC to "Zen 4," as well as an identical ISA. So, that's the same execution and issue width, the same number of registers, the same internal latencies, including multi-threading.
Prioritizing regular Zen 4 core over 4c cores still makes sense for lightly threaded loads (which includes many games, as even if they use lots of threads often only a handful of them will be heavily loaded), as they'll benefit from the higher throughput of the higher clocked cores.
 

edzieba

Distinguished
Jul 13, 2016
459
451
19,060
Nope, it won't make any difference nor will it work on AMD's hybrid CPUs either, because the Zen 4 and 4c cores don't behave in the same way like Intel's P and E cores. P and E cores are based on different architectures.
Missing the point entirely: core architecture is moot, the problem games face is per-core performance differences.
And here, Zen 4/Zen 4c is the exact same situation as P-cores/E-cores: Zen 4c cores clock lower than full Zen 4 cores (as well as having half the L3 cache), so for single-thread tasks a full Zen 4 core will perform better than a Zen 4c core. If a game only has a handful of threads (the vast majority) restricting them to the cores that can clock higher will result in improved performance over running them on the cores that cannot clock as high.
Gaming is a real-time workload not a batch workload, average throughput or throughput/watt are not useful metrics.
 
Some motherboards have built in overclocking features enabled by default so that they benchmark better than other motherboards, etc. These should be disabled.
If you were testing motherboards, sure. I'm testing GPUs, and I want maximum performance possible from the CPU, so I want the performance optimizations to be applied. The problem is that the MSI MEG Z790 Ace likely pushes things too far, or else the Cooler Master 360mm cooler just isn't quite able to handle the heat, so to speak.

Really, one of two things should happen:

1) All the motherboard manufacturers should do additional testing to verify their tweaks don't cause the CPU to overheat. And honestly, I'm really not doing much: I enable XMP in the BIOS after loading the defaults. If that crashes consistently, it's a mobo firmware issue as far as I'm concerned.

2) If Oodle has code that consistently crashes on Raptor Lake CPUs, whether it's overheating or something else, Oodle should debug this and figure out what needs to be done to prevent the crashing. It doesn't matter if "the code works on everything else!" I was a programmer, and if your code has problems on a popular subset of systems, you should try to fix it. Hell, it might be an Oodle bug in the first place that only shows up with Raptor Lake.

Frankly, both of these should happen as far as I'm concerned. Because if Oodle is causing the crash, that means other applications should trigger it as well — it's only a matter of hitting the right 'buttons' on the CPU, so to speak. And in the interim, Oodle shouldn't be okay with code that consistently causes crashes.
 
  • Like
Reactions: Sluggotg and Nyara
Intel's hybrid system didn't make sense to begin with since the P and E cores aren't the same (different supported instructions and so forth).
This is actually, mostly, factually incorrect. P-cores and E-cores support the same instruction sets. This is why AVX512 support was disabled with Alder Lake and Raptor Lake. The E-cores don't support AVX512, and so the solution was to just drop AVX512 support entirely.

It's true that if you disable all the E-cores (at least in the past) you could get access to AVX512 support. But I believe that got locked out with later BIOS revisions. Even so, fully disabling all your E-cores is a pretty drastic solution to getting AVX512 support and not something most people are willing to do I don't think.

Now, P-core and E-core are different architectures. They also clock very differently. So caches, pipelines, clocks, branch prediction, and more are all different. But other than AVX512 (which was dropped from consumer P-cores by Intel), they do support the same instructions. The same is true of Arm Big.Little designs AFAIK, because having different instruction sets would make scheduling and coding far more complex for software developers and operating systems.
 
Status
Not open for further replies.