News Intel Raptor Lake Specs Allegedly Exposed: Up To 24 Cores, 32 Threads

alceryes · Aug 20, 2022

bit_user said:
Okay, even if disabling them "fixes" the problem by clocking their ring bus stop higher, that's just a design quirk with this iteration. It's neither fundamental to the mere fact of having E-cores, nor even fundamental to having them on a ring bus. Intel just needs to move the async clock domain so the ring bus doesn't get dragged down with them.

Hopefully, it's something they've fixed in Raptor Lake.

Agreed.
I don't think it's happening with this next gen. That may be a few gens out.

alceryes · Aug 20, 2022

TerryLaze said:
They already are pretty much neutral, almost all games are getting a difference of less than 1% either way with a handful being up to 5% and the two games that see a "big" difference of about 10% are running at 200 and 500+FPS respectively.
Overall it's -1%

See, this is where I have a problem with what Intel has done with the P/E-cores on their latest CPUs.
Are we really gonna cheer for them when the possibly wasted silicon they bundled with their P-cores gets good enough to NOT hamper the P-cores' performance? Is that how low the expectation bar really is for some people...? That's not where it is for me.

Does anyone know of any reviews of productivity applications tested with the E-cores disabled? TechPowerUp has got a good review of E-core only performance. But that's not what I'd like to see tested. I want to know whether letting the P-cores run wild, without the E-cores being active, nets faster performance in production workloads than when all cores are enabled and working.

Intel Core i9-12900K E-Cores Only Performance Review

With Alder Lake, Intel is betting big on hybrid CPU core configurations. The Core i9-12900K has eight P(erformance) cores and eight E(fficient) cores. We were curious and tested the processor running the E-Cores only to see how well they perform against architectures like Zen 2, Zen 3, Skylake...

www.techpowerup.com

spongiemaster · Aug 20, 2022

bit_user said:
I'm rather sick of people being told they don't need more cores. Anyone who compiles software can almost certainly use more cores/threads. Phoronix regularly tests LLVM and Linux kernel compilation, both of which scale well even up to 256 threads.

At my job, a full build of our codebase routinely takes me the better part of an hour. I can't wait until I can upgrade to an i9-12900 (we're an Intel shop, so 5950X isn't an option for me).

No one coding is asking the question, why do I need more cores? No one cares what you do at work either, people here are talking about home systems. Not needing something ≠ there is no scenario you will ever come across when it could be beneficial. Usage that maxes more than 24 threads for any sustained period of time are exceedingly rare for home users, whether they are enthusiasts or not. Again, I didn't say there are none, so I don't need a list of fringe cases where they may.

spongiemaster · Aug 20, 2022

TerryLaze said:
Well ryzen has shown that there are enough idiots out there that will pay really good money for big multithread numbers even if they don't need it, so why should intel lose out on that?!
...Competition?...yay?! ...

I didn't say Intel shouldn't do it, I was just answering the poster's question. Most people will be just fine with a 13600k or 13700k. Few people will truly need a 13900k, and those that do already know it.

bit_user · Aug 20, 2022

alceryes said:
Maybe we'll see the return of an Extreme Edition Intel chip with only P-cores (active) and super high clocks.

Well, there's going to be a Sapphire Rapids workstation CPU and it won't have E-cores. Whether they sell a non-Xeon version for HEDT is an open question.

Otherwise, just buy a regular Alder Lake and disable the E-cores yourself.

bit_user · Aug 20, 2022

alceryes said:
Does anyone know of any reviews of productivity applications tested with the E-cores disabled? TechPowerUp has got a good review of E-core only performance. But that's not what I'd like to see tested. I want to know whether letting the P-cores run wild, without the E-cores being active, nets faster performance in production workloads than when all cores are enabled and working.

It's not exactly productivity apps, but SPEC Bench is comprised of real-world programs - not synthetic tests as you might presume. So, you can look at Anandtech's early tests to see what kind of contribution the E-cores make to all-thread performance:

The Intel 12th Gen Core i9-12900K Review: Hybrid Performance Brings Hybrid Complexity

www.anandtech.com

TL;DR: 25.9% better integer performance, 7.9% better floating point performance.

Now, that might sound like you'd do as good or better with 2 more P-cores, instead. However, remember these are each an average over a dozen or so benchmarks, some of which will scale worse than others (and 4 of the 12 floating-point tests look like real stinkers!).

The other thing to keep in mind is the power/thermal ceiling. Adding 2 more P-cores would use more power than 8 E-cores, meaning your all-core clocks would probably drop and thereby erode some of their contribution. Anandtech measured all-core P power at 239 W, while all-core E power was only 48 W. At low core counts, their data suggests P cores each want 26 W or so. However, going from 7 to 8 P cores, the delta is only 15 W, suggesting the extra core is already impacting its peers power budget (though there are other factors at play that certainly factor into that difference). When the 8 E cores are added on, they contribute about another 20 W, which is less than the mean contribution of a single P core.

The Intel 12th Gen Core i9-12900K Review: Hybrid Performance Brings Hybrid Complexity

www.anandtech.com

Going back to the first link, if we compare all P-cores vs. all E-cores: 165.2% better integer performance (22 E-cores' worth), 109.5% better float performance (17 E-core equiv.).

Again, with caveats about presuming linear scaling, that bodes well for increasing E-cores to 16. It suggests the E-cores should be ~2/3rds as powerful as the P-cores on the integer benchmark and nearly equivalent as the P-cores on floating point. Which is pretty impressive, if you consider they'll still occupy only about half the area. However, don't overlook what I said about scaling - the plots of individual test results shows that some of the floating point benchmarks, in particular, might not scale well, at all.

Addendum
I was curious how the E-cores measured up on just the five SPEC2017fp tests which seem to truly scale with core count. This is a somewhat opaque metric, without knowing why the others failed to scale (poor intrinsic scalability, power-limited, memory-bottlenecked, etc.?). However, here's what I found:

Comparing P-cores vs. P+E: the latter was 26.6% faster, which is suspiciously close to what we got with the entire integer test suite.
Comparing all P-cores vs. all E-cores: the former was 149.0% faster (or equivalent to ~20 E-cores)

So, it really does look like the E-cores pull their weight, in all-core scenarios. Furthermore, the all-thread workload metrics suggest you need about 2.5 E-cores to equal an Alder Lake P-core + Hyperthreading. Again, at just 25% area and roughly 20% of the power, that would make E-cores 60% more area-efficient and about 2x as power-efficient.

BTW, the tests I used for this last comparison were:

508.namd_r
511.povray_r
526.blender_r
538.imagick_r
544.nab_r

bit_user · Aug 20, 2022

spongiemaster said:
people here are talking about home systems.

I write code at home, too. Just not as much.

Students in STEM fields will probably find plenty of opportunities to benefit from more cores.

spongiemaster said:
Usage that maxes more than 24 threads for any sustained period of time are exceedingly rare for home users, whether they are enthusiasts or not.

This is a common fallacy of capacity - that you need to use near full capacity for an extended period of time, in order to benefit from it. You don't. During my typical workday, I probably spend under 10% of my time waiting for code to compile or regression tests to complete. However, when I am, that time is very costly. It has a cognitive cost, because it acts like a mental speedbump, interrupting my flow state while I'm developing or debugging.

Another example is the Mothers Day problem. If telephone companies added capacity based on typical utilization, then the system would fall over during peak utilization. People wouldn't be able to call their moms on Mothers Day, and you'd have tons of irate customers and a big PR problem to manage. Sometimes, peak utilization really matters.

spongiemaster said:
Again, I didn't say there are none, so I don't need a list of fringe cases where they may.

I don't know how old you are, but software development is not fringe. Every kid is taking coding classes, and lots go on to take more advanced Computer Science courses.

I'm sure there are plenty of other STEM applications that use lots of cores, but I can't really comment on that. I've dabbled a bit with NumPy and SciPy, and I'm aware some of the operations and modules can use multi-core.

bit_user · Aug 20, 2022

BTW, you always expect teething problems, when introducing something so ground-breaking as a hybrid CPU. That's why I think it was courageous. Some might say desperate, but you can actually find the notion of hybrid CPUs in Intel's public presentations as far back as 2005. So, at least we can say it wasn't reflexive or impulsive.

I'm confident Raptor Lake will iron out some of the wrinkles of E-cores. Meteor Lake will refine them yet further. Over time, the lack of E-cores is going to become an increasing liability for AMD.

Kamen Rider Blade · Aug 20, 2022

bit_user said:
But why?? Do you not trust the OS to schedule them properly? That's the only explanation I can imagine, but I haven't seen evidence to support it.

Given what we've seen so far, NOPE! All the extra effort that many have to use Process Lasso to tie a program to specific cores. You might as well not have Hybridized P/E cores on desktop. Leave the Hybridized setup for Mobile where they need it.

Why do you assume more is better? As they come at a nonzero cost, Intel & AMD are walking a balance between that cost and the benefits they provide in terms of latency-hiding and core utilization.

IBM isn't some no name goofy company. IBM managed to make SMT4 & SMT8 do extra work on the workloads they're targeting.

Having Dynamic SMT between 1-8 based on workloads offers you something that you didn't have before, flexibility based on how many threads and how much data is going to be used by L1 I/D$.

Here's Anandtech's article on it.

That's a feature that Intel & AMD can borrow and improve upon.

The next field of improvement that they can conquer and compete against each other and IBM's POWER architecture.

Plus we've all seen the road to future performance is with larger L1 I/D$ that Apple's M1 Architecture proves that it works, despite being at lower clock frequency.

192 KiB L1 I/D$ per big core is the future of CPU design. Apple lead the way; eventually, down the line, AMD & Intel will get there.

But Apple is behind on SMT, while Intel / AMD are stuck on yesterdays SMT2.

Dynamic SMT1-8 allows the CPU to adjust thread work based on code size and output.

I trust they've done the math and concluded 2-per-core is best. Maybe 4 really would be cost-effective, but they're just gun-shy from all these side-channel attacks that seem to argue for disabling it altogether. In that case, maybe they don't want to burn silicon doubling-down on a feature some big customers might simply opt not to use.

It's good enough for now, but moving foreward, they need to compete with the big boys. And IBM POWER is one of the BIG BOYS.

You won't know if some customers won't use it if you don't put it out there.

The same was with SMT2 originally. Nobody knew how useful it was until they made it and people came in and used it.

BTW, there's a very good solution to the SMT side-channel attack problem, which Google has already implemented in Linux: only allowing threads from the same process to share a core. If you do that, there's no cause for concern. It could also be implemented at the hypervisor level, which might already be the case for all I know.

That's where their security division/group in both AMD/Intel will have to go in and fix. That's on them.

You can't skimp on ISA registers. The only registers you can partition are the shadow registers, and I think we don't know how many they have for vector.

FWIW, Zen3 has 64 FP scheduler slots, which I think means you're not going to have more than 64 FP shadow registers. Since AVX-512 already increased the FP ISA registers to 32, we're not talking about a lot to partition between many threads.

Maybe they can find a way to reuse the Vector Registers for regular FP work and split amongst all the SMT threads to get more work done in parallel.

Huh? Do you think CPUs have an infinite silicon and power budget? Because that's the only universe in which E-cores don't also make sense for desktop.

I want E-cores for Desktop as well. You're not reading what I want correctly.

I want a 13P900KS (12x P-cores)
I want a 13E900KS (40x E-cores)

They're seperate CPU SKU's coming from different dies, both will co-exist and the customer can choose the CPU they need for the workload they intend to do.

You want a compile box, render box, encoder box. Go with E-core version.

You want a Single-Threaded Power house with a decent amount of multi-threadedness and plan on gaming. Get the P-core version.

As I said before, they give you 60% of P-core performance at 25% of the area and about 20% of the power. How is that not relevant for desktops, when we live in a world where all-core clocks are lower than peak single-thread clocks? Not to mention die area -> cost.

Again, you misunderstand what I want and didn't read it correctly.
I want BOTH P-cores & E-cores to exist, seperatedly, as their own SKU's.

Ring bus doesn't really have anything to do with E-cores. It's outlived its usefulness, anyhow. IMO, they shouldn't use it in any CPU with more than 8 bus stops anyway. Mesh scales way better.

If the Ringbus has "Outlived it's usefulness", Intel didn't get the memo. They're still using the Ringbus instead of Mesh on consumer side or direct connections like AMD.

Intel and other engineers have stated that Scaling sucks past 12x stops. Anything ≤ 12 stops is very much viable.

I'm sure the Thread Director is like 0.1% of die area. The kinds of stats it keeps are mostly those already tracked by core performance counters.

And they wouldn't need to worry about this if they were a Pure Core setup on Desktop.

They wouldn't have to suffer a performance penalty because of the Hybrid architecture.

Furthermore, you incorrectly assume the Thread Director is only useful for P- vs. E- scheduling, but I assure you it's quite valuable for Hyper Threading, because vector workloads scale so poorly with > 1 thread per core. It'll be telling if Intel leaves it in Sapphire Rapids.

Yes it would. Considering I hear that Sapphire Rapids will be offered in a ALL P-core version by default and they're thinking of an ALL E-core version for Sapphire Rapids.

That would make Intel very competitive to have ALL E-core versions.

Sapphire Rapids with:
_56x P-core's & AVX3-512 capabilities will be useful for it's niche
224x E-core's & AVX2-256 capabilities will be useful for it's specialties.

Remember the old adage: "Quantity has a Quality of it's own".

I value 224x E-core's Sapphire Rapids with only AVX-2-256 as the upper limit in AVX capabilities.

There's ALOT of workloads that can benefit from a CPU like that.

alceryes · Aug 20, 2022

bit_user said:
It's not exactly productivity apps, but SPEC Bench is comprised of real-world programs - not synthetic tests as you might presume. So, you can look at Anandtech's early tests to see what kind of contribution the E-cores make to all-thread performance:
TL;DR: 25.9% better integer performance, 7.9% better floating point performance.

I don't think this test is indicitive of real-world results. They had to manually set core affinities?! That's almost like cheating.
Once OSs have the built-in ability to automatically direct workloads, dynamically, with these hybrid cores AND we see this kind of performance uplift, then I'll give Intel the kudos. I really don't think it's going to be a liability for AMD for at least the next 5 years cause it will take Intel that long to get these E-cores to actually mean something (performance-wise) 100% of the time. Maybe it will pay off, in the long run. Maybe.

bit_user said:
The other thing to keep in mind is the power/thermal ceiling. Adding 2 more P-cores would use more power than 8 E-cores, meaning your all-core clocks would probably drop and thereby erode some of their contribution. Anandtech measured all-core P power at 239 W, while all-core E power was only 48 W. At low core counts, their data suggests P cores each want 26 W or so. However, going from 7 to 8 P cores, the delta is only 15 W, suggesting the extra core is already impacting its peers power budget (though there are other factors at play that certainly factor into that difference). When the 8 E cores are added on, they contribute about another 20 W, which is less than the mean contribution of a single P core.

This is true and would have to be figured into testing. Let the P-cores run wild just enough to get maximum performance without hindering overall clock speed.

alceryes · Aug 20, 2022

bit_user said:
Well, there's going to be a Sapphire Rapids workstation CPU and it won't have E-cores.

Yup, heard about that. It will be interesting if they release it as a top-tier performer. I thought it was just going to be a mid-level workstation chip though.

Kamen Rider Blade · Aug 20, 2022

alceryes said:
Yup, heard about that. It will be interesting if they release it as a top-tier performer. I thought it was just going to be a mid-level workstation chip though.

That's not what I heard. Sapphire Rapids is Intel's forray into Large Tile-Based Server chips with 4x Tiles interconnected and communicating via Foveros.

This is Intel's first attemp at Chiplets on the server side and they have ALOT of bugs. 500 Bugs to be exact.

That's forcing it to be post poned EVEN FURTHER away from launch.

And their entire future server line is based on this Tile Form Factor moving forewards for the forseeable future.

This will make it hard for Intel to compete with AMD, much less the ARM folks who are chomping at the bits for BIG Core Count #'s if they depend on P-core only.

That's why I take MLiD's rumor of a Sapphire Rapids with ALL E-cores very seriously.

It might not be the fastest, but it'll get Intel within the ball park of everybody else even if each core is barely faster than Skylake.

Like I said before, "There's a certain quality with quantity".

56 P-core slots x 4x E-core per slot = 224 E-core's.

That would be a mighty Server Chip for Intel to have in their portfolio.

and yes, each core would be weaker than it's competitors by being "Skylake-level" of performance, but with enough iterations and revisions, it can catch back up as long as you don't need to saddle the cores with AVX3-512. Leaving it at AVX2-256 level of extended instructions should be good enough for many work loads.

bit_user · Aug 20, 2022

Kamen Rider Blade said:
Given what we've seen so far, NOPE! All the extra effort that many have to use Process Lasso to tie a program to specific cores.

I'm pretty convinced that's just due to games making now-invalid assumptions about cores, and naively spawning one thread per core instead of the number of worker threads they actually need.

Over the next couple years, we should see an evolution on all fronts:

Raptor & Meteor Lake will eliminate the hardware tradeoffs of having E-cores
OS schedulers will become more refined
Games will be smarter and more judicious about the use of worker threads

Even still, there'll be work to do, but the situation should be a lot better than now. I don't know if you're old enough to remember Hyper Threading, but it had significant downsides when Intel first rolled it out in the Pentium 4. However, when it came back in the Core i-series, the hardware and software were both much improved and the downsides were truly minimal and far outweighed by the advantages.

Kamen Rider Blade said:
You might as well not have Hybridized P/E cores on desktop. Leave the Hybridized setup for Mobile where they need it.

I've done the math. It's a clear win for desktop. I understand it's not without tradeoffs, but it's not that much wasted silicon (20% of cores; probably < 12% of the entire die) if you have to disable them.

Kamen Rider Blade said:
IBM isn't some no name goofy company. IBM managed to make SMT4 & SMT8 do extra work on the workloads they're targeting.

I didn't say they were, but POWER is a completely different ISA and they're targeting different applications. You can't just say that because something makes sense for POWER (i.e. server CPUs), that Intel and AMD should be doing it too.

The Anandtech analysis was from 2016, using CPUs with far fewer core counts than we see in modern servers. ARM has proven to be competitive with small cores that lack SMT altogether. So, I think you can't just presume that what was good for POWER in 2016 is best for everyone in 2022.

There's an additional issue with simply throwing more threads into a CPU. A 96-core Genoa would become a 768-thread CPU @ SMT8. A dual-CPU system would have 1536 threads. A lot of workloads don't scale well to so many threads.

Kamen Rider Blade said:
That's a feature that Intel & AMD can borrow and improve upon.

Do you think it simply didn't occur to them to add more threads? Do you think they don't know what they're doing? These CPUs are designed by teams of the best processor architects in the world, in a highly-competitive market, and with billions of dollars at stake. I'm just saying it takes a lot of hubris to second-guess them, but be my guest.

I know AMD optimized each parameter of their microarchitecture and computed power/performance/area metrics on every aspect included in (or omitted from) the design of Zen3. So, I have to believe they considered adding more threads and found it not to be a net-win for their objectives.

Kamen Rider Blade said:
Plus we've all seen the road to future performance is with larger L1 I/D$ that Apple's M1 Architecture proves that it works, despite being at lower clock frequency.

Again, you're talking about a completely different ISA.

However, the other thing that's different is Apple's cost model. Apple isn't designing all-purpose cores that need to be cost-effective in CPUs on the open market. They're designing cores that power a fairly narrow range of products - from phones to mini PCs, and they have the benefits of vertical integration and premium pricing which they can rely upon to build more expensive CPUs. Intel, AMD, and ARM simply can't afford to throw around die area like Apple does. They need to be more judicious about PPA (performance per area), whereas Apple is mostly optimizing just for power-efficiency.

Kamen Rider Blade said:
It's good enough for now, but moving foreward, they need to compete with the big boys. And IBM POWER is one of the BIG BOYS.

No, it's not. POWER has < 1% of the server/cloud marketshare. x86-64 already beat it almost out of existence, and now most deployments of non-x86 are ARM. The OpenPOWER Foundation was created out of desperation.

Kamen Rider Blade said:
You won't know if some customers won't use it if you don't put it out there.

Yes, you would, because:

They do modeling and simulation before committing to a design, so they can see if it pulls its weight.
They actually talk to big customers to see what they want and what are their priorities.

Kamen Rider Blade said:
The same was with SMT2 originally. Nobody knew how useful it was until they made it and people came in and used it.

That's not true. SMT wasn't a new concept, when Intel came around to it. They certainly didn't do it on a whim.

Kamen Rider Blade said:
That's where their security division/group in both AMD/Intel will have to go in and fix. That's on them.

You absolutely cannot do security as an afterthought! The only hope of getting it right is from the ground up. If we haven't learned this by now, shame on us.

Kamen Rider Blade said:
Maybe they can find a way to reuse the Vector Registers for regular FP work

Post-SSE, they are one in the same.

x87 is a legacy adjunct, and I wouldn't be surprised if it didn't even have shadow registers.

Kamen Rider Blade said:
I want E-cores for Desktop as well. You're not reading what I want correctly.

I want a 13P900KS (12x P-cores)
I want a 13E900KS (40x E-cores)

They're seperate CPU SKU's coming from different dies, both will co-exist and the customer can choose the CPU they need for the workload they intend to do.

The reason that will never happen is that there are too many lightly-threaded workloads and cases involved in everyday desktop computing. And that's where P-cores shine. So, for the best computing experience per-W and per-$, you want a mix of P- and E- cores.

Kamen Rider Blade said:
You want a compile box, render box, encoder box. Go with E-core version.

Those are niche use cases, and can be addressed by Intel's E-only server CPUs (Sierra Forrest, due in 2024).

Kamen Rider Blade said:
If the Ringbus has "Outlived it's usefulness", Intel didn't get the memo. They're still using the Ringbus instead of Mesh on consumer side or direct connections like AMD.

Maybe they were changing so much with Alder Lake that they deemed a change to the interconnect fabric added too much risk. Anyway, it's mainly an issue for the big die models (i.e. the i7, i9, and maybe a couple of the i5's).

I'm just saying that, as they continue to scale core counts, the Ring bus will have to go. And that's why I don't care so much about Ring Bus issues, because I'm looking more at the concept of E-cores more than specifically Alder Lake's implementation of them.

Kamen Rider Blade said:
Intel and other engineers have stated that Scaling sucks past 12x stops. Anything ≤ 12 stops is very much viable.

FWIW, that's also from like 2015-2016. In the modern era of DDR5 and faster, wider cores, it could be more of a liability. Or maybe it's not, but either way Intel won't be able to keep scaling on the Ring bus, so it should disappear from the higher core-count models.

Kamen Rider Blade said:
And they wouldn't need to worry about this if they were a Pure Core setup on Desktop.

They wouldn't have to suffer a performance penalty because of the Hybrid architecture.

Note how that article says "teething trouble"? That's because it's new tech. New stuff always has rough edges. Ever heard the phrase "bleeding edge"?

Anyway, they're not saying it's an outright bad idea. Read their conclusion.

bit_user · Aug 20, 2022

alceryes said:
I don't think this test is indicitive of real-world results. They had to manually set core affinities?! That's almost like cheating.

That was just to test the E-only use case, because there's no way to disable the P-cores while leaving the E-cores active.

If they set affinities in any other case - especially in the P+E scenario - it would only hurt the results, because the benchmark would be sitting around, waiting for the E-cores to finish. So, your point is moot either way.

alceryes said:
Once OSs have the built-in ability to automatically direct workloads, dynamically, with these hybrid cores AND we see this kind of performance uplift, then I'll give Intel the kudos.

They already do. Windows 11 had it at Alder Lake's launch, and Linux got it earlier this year.

bit_user · Aug 20, 2022

Kamen Rider Blade said:
That's why I take MLiD's rumor of a Sapphire Rapids with ALL E-cores very seriously.

It's not a rumor. They officially announced it on Feb 17th of this year. It's called Sierra Forest and will use a newer generation of E-core than Gracemont.

https://www.anandtech.com/show/1725...scalable-roadmap-new-ecore-only-xeons-in-2024

alceryes · Aug 20, 2022

bit_user said:
That was just to test the E-only use case, because there's no way to disable the P-cores while leaving the E-cores active.
If they set affinities in any other case - especially in the P+E scenario - it would only hurt the results, because the benchmark would be sitting around, waiting for the E-cores to finish. So, your point is moot either way.

My point is that they shouldn't have to set any type of CPU affinity in the first place. This should be automatically, intelligently done but it isn't, which brings me to my second point below.

bit_user said:
They already do. Windows 11 had it at Alder Lake's launch, and Linux got it earlier this year.

So, in Windows 11, either the Thread Director is doing the best job that can be done but the E-cores are just that bad, OR there can be improvements to the Thread Director? Which one do you think is the case?

I think both the Director AND the E-core arcitecture can be improved and only then will the E-cores bring something to the table for gamers.

bit_user · Aug 20, 2022

alceryes said:
My point is that they shouldn't have to set any type of CPU affinity in the first place. This should be automatically, intelligently done but it isn't, which brings me to my second point below.

How else are you going to benchmark "E-cores only" case, then? The OS is obviously not going to keep those threads exclusively on the E-cores!

Yes, it was an artificial experiment, but if you read the page you'll understand what they were trying to measure and why.

alceryes said:
So, in Windows 11, either the Thread Director is doing the best job that can be done but the E-cores are just that bad, OR there can be improvements to the Thread Director? Which one do you think is the case?

There are two further scenarios beyond what you've said:

Software is naively assuming all cores are equal, spinning up a worker thread per core, and evenly dispatching work items to those worker threads. This leads to more work going to the E-cores than you want, and the OS can't really do much about it. The problem that the concurrency APIs provided by operating systems were already deficient and a big.little architecture just makes the situation even worse.
The hardware has some snags, like has been pointed out with the E-cores throttling their Ring Bus stops.

There's even a further problem with the dominant model of multi-core load balancing, which is that a game engine is parceling out chunks of work and indiscriminantly handing them off to worker threads. Some of the work items could involve I/O or integer workloads, which the thread director detects as a good candidate for migrating to an E-core. So, that worker thread gets moved at the next opportunity, but then maybe the next packet of work it gets involves some heavy vector computation that would better be run on a P-core. That migration means you have work happening on cores not ideal for it, and there's extra overhead from migration. The underlying problem is the invalidation of an assumption made by the thread director & OS scheduler - that each thread will continue doing the same sort of work as before.

So, without changing any underlying OS APIs, what the game developers will probably do is separate their worker threads into E-core and P-core, and start dispatching work items to the appropriate pool. They'll probably also set affinities, accordingly. This is still a bit short-term, but will probably compensate for the deficits we see... and maybe then some.

Anyway, the big picture view of the problem is an abstraction failure. A lot of this could be avoided, if operating systems provided better concurrency APIs and then if games actually used them. It would also benefit more than just hybrid CPUs.

Kamen Rider Blade · Aug 20, 2022

bit_user said:
I'm pretty convinced that's just due to games making now-invalid assumptions about cores, and naively spawning one thread per core instead of the number of worker threads they actually need.

Over the next couple years, we should see an evolution on all fronts:

Raptor & Meteor Lake will eliminate the hardware tradeoffs of having E-cores

OS schedulers will become more refined

Games will be smarter and more judicious about the use of worker threads

Even still, there'll be work to do, but the situation should be a lot better than now. I don't know if you're old enough to remember Hyper Threading, but it had significant downsides when Intel first rolled it out in the Pentium 4. However, when it came back in the Core i-series, the hardware and software were both much improved and the downsides were truly minimal and far outweighed by the advantages.

Some software will get patched and changed to deal with Intel's hybrid architecture, others will not.
It's going to be incredibly patchwork across the software spectrum of x86.

I've done the math. It's a clear win for desktop. I understand it's not without tradeoffs, but it's not that much wasted silicon (20% of cores; probably < 12% of the entire die) if you have to disable them.

That's great for Mobile usage, I wholy approve of it in Mobile devices.
But on DeskTop, I want Monolithic CPU setups.

Give me ALL P-cores
OR
Give me ALL E-cores

I'll gladly take either options for different usage scenarios.

I didn't say they were, but POWER is a completely different ISA and they're targeting different applications. You can't just say that because something makes sense for POWER (i.e. server CPUs), that Intel and AMD should be doing it too.

From everything I've heard on the rumor bit; AMD & Intel were both researching SMT4, they just haven't gotten it working with their architectures yet, so it's set on the backburner and will be brought up later once they get it working.

The Anandtech analysis was from 2016, using CPUs with far fewer core counts than we see in modern servers. ARM has proven to be competitive with small cores that lack SMT altogether. So, I think you can't just presume that what was good for POWER in 2016 is best for everyone in 2022.

But SMT2 on x86 has proven itself to be valuable in many workloads. Adding more could be useful in those scenarios.
And that's why I want Dynamic SMT where the Core auto-adjusts how many threads it processes based on it's own internal understanding of what it can handle.

There's an additional issue with simply throwing more threads into a CPU. A 96-core Genoa would become a 768-thread CPU @ SMT8. A dual-CPU system would have 1536 threads. A lot of workloads don't scale well to so many threads.

True, some workloads won't scale well at all. But there are other workloads that scale nearly linearly with the # of cores you throw at it.
I'm focused on helping those workloads that do benefit from massively multi-threading.

Those that don't, their authors/code maintainers need to figure out how to scale their apps to many many threads if possible at all.

Do you think it simply didn't occur to them to add more threads? Do you think they don't know what they're doing? These CPUs are designed by teams of the best processor architects in the world, in a highly-competitive market, and with billions of dollars at stake. I'm just saying it takes a lot of hubris to second-guess them, but be my guest.

Damn straight we have hubris, that's what we in the fan community have plenty of.

I'm more than proud enough to admit it.

I'm also proud enough to call them out on missing out on key opportunities / technologies like when Intel canceled Optane unceremoniously.

That really ticked me off because it was damn good technology that had it's niche, but Intel didn't use it's monopoly status to push it into dominance and acceptance.

I know AMD optimized each parameter of their microarchitecture and computed power/performance/area metrics on every aspect included in (or omitted from) the design of Zen3. So, I have to believe they considered adding more threads and found it not to be a net-win for their objectives.

AMD is about iterating, and SMT-n might not be ready yet. So give it time. I'm willing to take the long view on that.

Again, you're talking about a completely different ISA.

However, the other thing that's different is Apple's cost model. Apple isn't designing all-purpose cores that need to be cost-effective in CPUs on the open market. They're designing cores that power a fairly narrow range of products - from phones to mini PCs, and they have the benefits of vertical integration and premium pricing which they can rely upon to build more expensive CPUs. Intel, AMD, and ARM simply can't afford to throw around die area like Apple does. They need to be more judicious about PPA (performance per area), whereas Apple is mostly optimizing just for power-efficiency.

If you noticed with Intel architectures moving foreward, they're getting wider with larger L1 I/D $ and larger L2 I/D $ along with more decoders.

The only difference is that Apple went directly in that direction first, at the cost of frequency & focused on Power Savings first.
Since Apple has full control of their own Tech stack, they can optomize for their setup to the n-th degree.

Something Intel/AMD can't do on x86 since they have to be generic to some degree.

But both AMD/Intel are gradually going in the same route that Apple has hit right now with the M1.

It's just at a MUCH slower pace to get to the same point while battling each other and maintaing that performance lead & leap frogging each other.

Ergo, they have to optimize for PPA & Performance.

No, it's not. POWER has < 1% of the server/cloud marketshare. x86-64 already beat it almost out of existence, and now most deployments of non-x86 are ARM. The OpenPOWER Foundation was created out of desperation.

Yet they aren't dead and still exist. And the technology they have is still spectacular and some of it can be copied / borrowed IMO.

Yes, you would, because:

They do modeling and simulation before committing to a design, so they can see if it pulls its weight.

They actually talk to big customers to see what they want and what are their priorities.

Remember Steve Job's adage of:

"Some people say give the customers what they want, but that's not my approach. Our job is to figure out what they're going to want before they do. I think Henry Ford once said, "If I'd ask customers what they wanted, they would've told me a faster horse." People don't know what they want until you show it to them. That's why I never rely on market research. Our task is to read things that are not yet on the page".

I believe Intel & AMD need a little bit of that mentality.

That's not true. SMT wasn't a new concept, when Intel came around to it. They certainly didn't do it on a whim.

But the vast majority of the Software market didn't know about SMT and wasn't ready for it. Intel predicted the useful-ness of SMT and pushed it into existence.

Effectively driving customers to demanding the benefits of SMT due to how it eventually was beneficial to performance.

You absolutely cannot do security as an afterthought! The only hope of getting it right is from the ground up. If we haven't learned this by now, shame on us.

Tell that to Intel, they seem to be more prone to Security Issues than AMD and Apple. Even then, AMD & Apple still mess up and have their own issues.

Post-SSE, they are one in the same.

x87 is a legacy adjunct, and I wouldn't be surprised if it didn't even have shadow registers.

IC, so they'll be fully used then.

The reason that will never happen is that there are too many lightly-threaded workloads and cases involved in everyday desktop computing. And that's where P-cores shine. So, for the best computing experience per-W and per-$, you want a mix of P- and E- cores.

I'd rather have one box with ALL P-cores.
Then have a secondary box with ALL E-cores doing what it does best, (Compiling, Encoding, Transcoding, Rendering, etc).

While the E-core box is doing it's heavy load super efficienctly and occupying almost all the cores, I can still use my P-core for work without being bogged down in any way.

Those are niche use cases, and can be addressed by Intel's E-only server CPUs (Sierra Forrest, due in 2024).

I think it's far less niche than you may believe. Especially with the rise of many content creators on Social Media and small studios who could use a seperate dedicated "P-Core" box & "E-core" box per person.

Maybe they were changing so much with Alder Lake that they deemed a change to the interconnect fabric added too much risk. Anyway, it's mainly an issue for the big die models (i.e. the i7, i9, and maybe a couple of the i5's).

I think they kept Ring-bus because the latency is faster than mesh interconnects when you're ≤ 12 stops.
That's why they stick with it on the consumer end.

Anandtech did the tests, and the only thing faster than Ring-bus is AMD's direct connections between cores.

But even with Direct Connections on both Intel & AMD setups, you can make a hybrid setup that can have the best of both worlds IMO.

I'm just saying that, as they continue to scale core counts, the Ring bus will have to go. And that's why I don't care so much about Ring Bus issues, because I'm looking more at the concept of E-cores more than specifically Alder Lake's implementation of them.

I see the E-core Quad-Core complex as a perfect Square setup w/ Direct Interconnects between all 4x Cores.
Any Traffic that leads outside of the Quad-Core Complex will fall onto whatever Interconnect that is used in the CPU Die.
Doesn't matter if it's Ringbus, Mesh, or some other Hybrid setup.

FWIW, that's also from like 2015-2016. In the modern era of DDR5 and faster, wider cores, it could be more of a liability. Or maybe it's not, but either way Intel won't be able to keep scaling on the Ring bus, so it should disappear from the higher core-count models.

From what I can tell, Intel doesn't want to sell Higher Core-Counts past 12 to the consumer. It's already trying to pull teeth to get them to give consumers 10-cores like in Comet Lake.
Their current Paradigm is INCREDIBLY stingy with the 8x P-Cores & Nx E-Cores as their solution.

I want them to change the Paradigm to looking at selling Dies/Tiles and not worrying about how many cores are on their.

Intel needs to stop being "Stingy". They're literally holding back the consumer side of the industry.

We were saddled with Quad-Core hell for a decade by Intel due to weak/non-existent competition from AMD.

Right now, it's just 8x P-core + Nx E-core Hell at the moment due to Intel's stingy-ness by their upper management.

That's the part that I don't like.

Note how that article says "teething trouble"? That's because it's new tech. New stuff always has rough edges. Ever heard the phrase "bleeding edge"?

Yeah, I've heard of it. Let the Mobile side bleed.
We on the DeskTop side shouldn't have to bleed just because they want to force their big.LITTLE paradigm across the product stack.

They sure as hell aren't forcing big.LITTLE on Enterprise. It's still largely Monolithic Core Types over there.

That's something DeskTop could use.

Anyway, they're not saying it's an outright bad idea. Read their conclusion.

I don't think it's a bad idea either, especially for Mobile products.
I just don't think it should be used on DeskTop computers where power constraints aren't really a thing where most people don't give a damn about how much power is being consumed on DeskTop in the US.

bit_user · Aug 21, 2022

Kamen Rider Blade said:
Some software will get patched and changed to deal with Intel's hybrid architecture, others will not.
It's going to be incredibly patchwork across the software spectrum of x86.

The solution will be better APIs which create less work for application developers, while delivering better performance. MacOS is actually ahead, in this department. And no, I'm not a Mac fan.

Kamen Rider Blade said:
But on DeskTop, I want Monolithic CPU setups.

Give me ALL P-cores
OR
Give me ALL E-cores

You mean homogeneous.

Kamen Rider Blade said:
From everything I've heard on the rumor bit; AMD & Intel were both researching SMT4, they just haven't gotten it working with their architectures yet, so it's set on the backburner and will be brought up later once they get it working.

That's plausible and maybe they'll do it. I just wouldn't second guess their decision not to do it yet.

And I don't believe for a second they "haven't gotten it working". If they wanted to do it, they would. It's as simple as that. Once you've got 2-way SMT working, the marginal effort to go 4-way is surely nothing for them.

Kamen Rider Blade said:
True, some workloads won't scale well at all. But there are other workloads that scale nearly linearly with the # of cores you throw at it.

Not when we're talking about thousands of threads. That's pretty much unexplored territory.

The way to tackle this is to lay the software groundwork, ensuring scalability in... well, Linux... and ensuring there's the necessary API support for applications to scale also. If you just build the hardware first, it has a limited sales window and you won't be able to make a dent in the software situation within its product cycle. So, that's obviously not the way to go.

However, what we're seeing is sort of the opposite. Instead of trying to get applications scaling up larger, AMD introduced partitioning (so-called NPS2 and NPS4 modes), which enable the CPU to be efficiently sub-divided so it can be treated as 2 or 4 smaller CPUs.

Also, it's worth noting that a lot of the applications that scale well to hundreds of threads would run even better on GPUs. So, that's one reason not to bother when people would be better off just porting to a GPU.

Kamen Rider Blade said:
Damn straight we have hubris, that's what we in the fan community have plenty of.

Okay, well at least that much is established.

Kamen Rider Blade said:
I'm also proud enough to call them out on missing out on key opportunities / technologies like when Intel canceled Optane unceremoniously.

Yup, sad to see it go. I can't say if they were up against an unsolvable physics problem, however. If the tech is fundamentally scalable, someone else will do it eventually. If not, then its days were numbered.

Kamen Rider Blade said:
Remember Steve Job's adage of: "... Our job is to figure out what they're going to want before they do. ..."

Yeah, good product managers do this. They look at the customer needs and where the underlying technology is going to be, and try to find the right set of solutions to address the future market needs, rather than what the market is saying it wants today.

AMD and Intel are building products with very large up-front investments and very long lead times. There's no question they are trying to predict where the market is going to be, by the time these things launch.

Kamen Rider Blade said:
But the vast majority of the Software market didn't know about SMT and wasn't ready for it. Intel predicted the useful-ness of SMT and pushed it into existence.

You're basing that statement on what, exactly?

When Intel first did SMT, it was actually pretty easy. Their CPUs had only one core, and most machines had only one CPU. So, there were few issues with "is this a real core or just a hyperthread?"

Kamen Rider Blade said:
Effectively driving customers to demanding the benefits of SMT due to how it eventually was beneficial to performance.

LOL. Yeah, they "demanded it" so much that Intel took it away in the Core 2 generation and nobody complained.

Kamen Rider Blade said:
Tell that to Intel, they seem to be more prone to Security Issues than AMD and Apple. Even then, AMD & Apple still mess up and have their own issues.

I think Intel is the prime example that it cannot be an afterthought.

Kamen Rider Blade said:
I'd rather have one box with ALL P-cores.

Okay, so either disable your E-cores or buy a Sapphire Rapids server or workstation.

Kamen Rider Blade said:
Then have a secondary box with ALL E-cores doing what it does best, (Compiling, Encoding, Transcoding, Rendering, etc).

Then get an Alder Lake N (rumored to have up to 8 E-cores) or wait for Sierra Forest.

Kamen Rider Blade said:
While the E-core box is doing it's heavy load super efficienctly and occupying almost all the cores, I can still use my P-core for work without being bogged down in any way.

If the OS is working as it should, then you can run your batch jobs at a lower priority and you should hardly notice. On Linux, you do this by setting its "nice" level. I don't know much about priorities in Windows, other than the few times I've tweaked things via the Task Manager.

In the past, one way you might notice was I/O. However, SSDs can do so many IOPS and scale to so many threads that even I/O probably isn't a bottleneck.

Kamen Rider Blade said:
I think it's far less niche than you may believe. Especially with the rise of many content creators on Social Media and small studios who could use a seperate dedicated "P-Core" box & "E-core" box per person.

It just sounds to me like a waste of money and yet another thing to manage.

Kamen Rider Blade said:
I think they kept Ring-bus because the latency is faster than mesh interconnects when you're ≤ 12 stops.
That's why they stick with it on the consumer end.

Maybe they kept it because it uses less die area than the mesh.

Kamen Rider Blade said:
Anandtech did the tests, and the only thing faster than Ring-bus is AMD's direct connections between cores.

Maybe I missed it, but I think they didn't actually say what topology Zen 3 uses within a CCD.

Kamen Rider Blade said:
I see the E-core Quad-Core complex as a perfect Square setup w/ Direct Interconnects between all 4x Cores.
Any Traffic that leads outside of the Quad-Core Complex will fall onto whatever Interconnect that is used in the CPU Die.

Okay, so then arrange the P-cores in quads and then you shrink your ring bus and its problems.

To me, the quad E-core tile seems like something they did because of deadline pressure. They had these quad E-core tiles for the E-CPUs and decided just to drop them into Alder Lake.

Kamen Rider Blade said:
From what I can tell, Intel doesn't want to sell Higher Core-Counts past 12 to the consumer. It's already trying to pull teeth to get them to give consumers 10-cores like in Comet Lake.
...
Right now, it's just 8x P-core + Nx E-core Hell at the moment due to Intel's stingy-ness by their upper management.

Raptor Lake is 24 cores, 32 threads. More raw compute power than 16 P-cores. And it's not because they're stingy.

Kamen Rider Blade said:
I just don't think it should be used on DeskTop computers where power constraints aren't really a thing

Yes, they are. Why do you think the single-core clock is higher than all-core? That's entirely due to limits on power delivery & cooling.

Kamen Rider Blade said:
where most people don't give a damn about how much power is being consumed on DeskTop in the US.

They care about how expensive, heavy, bulky, hot, and loud their cooling is, and probably how much heat their rig is churning out.

I take it you don't live in Texas, because electricity during a Texas summer ain't cheap. And it's not just powering your rig that you have to pay for, but also running the A/C to remove the heat. There were even predictions Texas might have rolling blackouts, this summer, due to excessive demand on the electrical grid.

TerryLaze · Aug 21, 2022

bit_user said:
but you can actually find the notion of hybrid CPUs in Intel's public presentations as far back as 2005. So, at least we can say it wasn't reflexive or impulsive.

Yeah but that was about the integration of GPU compute into the CPUs, which was why AMD freaked out and way overpaid for Ati in 2006 to get their hands on some GPU IP to not be left behind.

bit_user said:
Over the next couple years, we should see an evolution on all fronts:

Raptor & Meteor Lake will eliminate the hardware tradeoffs of having E-cores

OS schedulers will become more refined

Games will be smarter and more judicious about the use of worker threads

Games will "never" improve, as long as it runs on windows without them doing anything to the console code except for compiling it with the windows flag checked then that's the maximum the devs will do.
It's only when a game heavily fails to perform that devs will, maybe, do something like with batman arkham city or agents of mayhem.

They will not code for something specific to windows systems and they barely code for anything specific on the console they develop for since that would cause "issues" (additional cost) for porting it to other consoles and windows.

bit_user · Aug 21, 2022

TerryLaze said:
Games will "never" improve, as long as it runs on windows without them doing anything to the console code except for compiling it with the windows flag checked then that's the maximum the devs will do.

That might be true of XBox, but PS5 is definitely not a simple recompile.

Anyway, hybrid CPUs are here to stay, so I think game devs and engines will adapt.

Kamen Rider Blade · Aug 21, 2022

bit_user said:
The solution will be better APIs which create less work for application developers, while delivering better performance. MacOS is actually ahead, in this department. And no, I'm not a Mac fan.

Good luck getting MS to do much work, they've been pretty lazy as of late.

You mean homogeneous.

That works too.

That's plausible and maybe they'll do it. I just wouldn't second guess their decision not to do it yet.

And I don't believe for a second they "haven't gotten it working". If they wanted to do it, they would. It's as simple as that. Once you've got 2-way SMT working, the marginal effort to go 4-way is surely nothing for them.

I don't think it's nearly as easy to scale up from 2-way SMT to n-way SMT without extra resources on the transistor side.
Especially when you're trying to process multiple threads in parallel simultaneously in real time. You need to plan out how many resources can be split for each core into each Virtual core. And that requires alot more transistor area, especially when I want Dynamic SMT from SMT1 -> SMT8 and everything in between.

Not when we're talking about thousands of threads. That's pretty much unexplored territory.

The way to tackle this is to lay the software groundwork, ensuring scalability in... well, Linux... and ensuring there's the necessary API support for applications to scale also. If you just build the hardware first, it has a limited sales window and you won't be able to make a dent in the software situation within its product cycle. So, that's obviously not the way to go.

However, what we're seeing is sort of the opposite. Instead of trying to get applications scaling up larger, AMD introduced partitioning (so-called NPS2 and NPS4 modes), which enable the CPU to be efficiently sub-divided so it can be treated as 2 or 4 smaller CPUs.

As for NUMA models, that's more to deal with which sections of the CPU have closer/faster access to RAM and which groups of cores are clustered so that you can assign your process / threads appropriately. It may not be common on desktop usage because we don't have to deal with it very often, but it is dealt with on the enterprise side.

Also, it's worth noting that a lot of the applications that scale well to hundreds of threads would run even better on GPUs. So, that's one reason not to bother when people would be better off just porting to a GPU.

But not everything goes well with a GPU, especially things like Video Encoding when you want to get the smallest possible file size.
GPU's are great at speed, but not at compressing things down to the tiniest file possible. That's something that CPU's are just better at currently.
GPU's aren't great for branching logic. They're great for raw # crunching in parallel.

Yup, sad to see it go. I can't say if they were up against an unsolvable physics problem, however. If the tech is fundamentally scalable, someone else will do it eventually. If not, then its days were numbered.

It's not a physics problem, they were solving that part just fine. It's a economies of scale problem.
Remember how FireWire died because it was too proprietary and Apple didn't want to share.
ThunderBolt was given to the USB-IF group so that the tech wouldn't die.
Now ThunderBolt is viable and will live on through USB and all future implementations.

Optane suffered management stupidity and they didn't use Intel's dominance to force Optane to be the "OS drive" for ALL Intel and non-Intel based CPU's in big vendors partnered with Intel and it's ODM's / OEM's / Large SI's.

If Intel did that, they would drive enough demand to get Optane to be licensed by ALL the major memory manufacturers at a reasonable licensing deal.

But they didn't, they wanted to go straight for enterprise and failed because they couldn't scale down cost because they didn't have enough initial demand.

Demand that would've been created by the consumer end that would've justified mass production from all memory vendors, instead of only Micron.

Yeah, good product managers do this. They look at the customer needs and where the underlying technology is going to be, and try to find the right set of solutions to address the future market needs, rather than what the market is saying it wants today.

AMD and Intel are building products with very large up-front investments and very long lead times. There's no question they are trying to predict where the market is going to be, by the time these things launch.

They could do a bit more and be more experimental, but that would involve a bit more risk, and we all know how risk averse they are.

You're basing that statement on what, exactly?

When Intel first did SMT, it was actually pretty easy. Their CPUs had only one core, and most machines had only one CPU. So, there were few issues with "is this a real core or just a hyperthread?"

Living through the times when SMT was introduced and seeing how software adjusted to it. It took some time, but it got there.

LOL. Yeah, they "demanded it" so much that Intel took it away in the Core 2 generation and nobody complained.

Intel was behind at the time when Athlon 64 was dominant, they had to do anything to catch up on performance.

I think Intel is the prime example that it cannot be an afterthought.

And yet Intel seems to run into the same problem over & over.

Okay, so either disable your E-cores or buy a Sapphire Rapids server or workstation.

Then get an Alder Lake N (rumored to have up to 8 E-cores) or wait for Sierra Forest.

Or we can allow DeskTop consumers to choose for themselves instead of trying to force them to buy Enterprise parts that they can't afford.
A Homogenous P-core SKU or Homogenous E-core SKU ready for DeskTop sockets.

If the OS is working as it should, then you can run your batch jobs at a lower priority and you should hardly notice. On Linux, you do this by setting its "nice" level. I don't know much about priorities in Windows, other than the few times I've tweaked things via the Task Manager.

In the past, one way you might notice was I/O. However, SSDs can do so many IOPS and scale to so many threads that even I/O probably isn't a bottleneck.

I'd rather just have one machine to handle heavy jobs running at full tilt using every resource possible.
Then I can be on the other machine doing my job. That's how I do things.

It just sounds to me like a waste of money and yet another thing to manage.

Not really, this kind of setup is pretty common.
One Render / Encoder / Compile box.
One Editor machine.

Maybe they kept it because it uses less die area than the mesh.

I think they kept it for latency reasons because gaming depends more on latency and the latency penalty from mesh isn't worth it.

Maybe I missed it, but I think they didn't actually say what topology Zen 3 uses within a CCD.

They did, they claimed they added Ring-Bus. But when they tested the core to core latency, it operated just like direct connections.
That's why I'm suspicious that AMD isn't telling the full truth and is using a Hybrid Interconnect topology of Direct Connect + Ring-Bus.

Okay, so then arrange the P-cores in quads and then you shrink your ring bus and its problems.

To me, the quad E-core tile seems like something they did because of deadline pressure. They had these quad E-core tiles for the E-CPUs and decided just to drop them into Alder Lake.

I don't see the E-core Quad-core Cluster as a "Deadline Pressure" thing. I think it's a brilliant design decision that leveraged fundamental networking principles.
When you have 4 Objects that need to communicate together, what's the fastest way to communicate with every object in that group?
A Square with diagonal connections for direct connections between every member.
Everything is based on fundamental networking principles and the Square Quad-core is obeying one of the basic networking rules of minimal latency for cross core communication within it's cluster.
Zen did it previously with direct connects between 4 cores.
Intel just took that to the next extreme and created the E-core cluster and attached it to the larger Ring-bus or Mesh Network as one large Quad-Core cluster.

Raptor Lake is 24 cores, 32 threads. More raw compute power than 16 P-cores. And it's not because they're stingy.

Raptor Lake's 24C/32T is 8 P-cores & 16 E-cores.
Those 16 E-cores eats the die area of 4 P-cores.
Sorry to burst your bubble, but 16 P-cores would clobber 24C/32T any day of the week.

And Intel does it because they're stingy and doesn't want to give consumers "Too many P-cores".

It's been stated by Intel representatives that the 8-core limit is intentional market segmentation, not a technical limit.

Comet Lake gave us 10-core, but they backed away from 10-core and used those 2 P-core slots for 8 E-cores in it's place.

Yes, they are. Why do you think the single-core clock is higher than all-core? That's entirely due to limits on power delivery & cooling.

I don't think you get what I'm infering. I'm not talking about Single-core clock vs All-core Frequencies. I know that. That's basic 101.
The point is that giving Heterogenous Architecture onto DeskTop isn't nearly as necessary when Intel customers and many desktop customers don't care and are willing to OC or pump up the Wattage to get as much performance as the power delivery on their MoBo's allows.
Many users on DeskTop, especially enthusiasts unleash the power limits and see how far they can push their systems.

So hybridizing isn't really a priority or a necessity for DeskTop power savings IMO since the paradigm hasn't been focused on that for quite a while.

They care about how expensive, heavy, bulky, hot, and loud their cooling is, and probably how much heat their rig is churning out.

To some degree, but it's more about if they can cool it effectively and reliably. But we have plenty of modern solutions for that.

I take it you don't live in Texas, because electricity during a Texas summer ain't cheap. And it's not just powering your rig that you have to pay for, but also running the A/C to remove the heat. There were even predictions Texas might have rolling blackouts, this summer, due to excessive demand on the electrical grid.

I live in SoCal, electricity ain't cheap here either.

TerryLaze · Aug 21, 2022

bit_user said:
That might be true of XBox, but PS5 is definitely not a simple recompile.

Anyway, hybrid CPUs are here to stay, so I think game devs and engines will adapt.

PS5 exclusive titles maybe, like spiderman, everything that comes out on all consoles is the same code.

bit_user · Aug 21, 2022

TerryLaze said:
PS5 exclusive titles maybe, like spiderman, everything that comes out on all consoles is the same code.

That's not what I heard from someone well-placed to know. He said game developers outsource the console ports to "porting houses" that do pretty much nothing but console ports.

TerryLaze · Aug 21, 2022

bit_user said:
That's not what I heard from someone well-placed to know. He said game developers outsource the console ports to "porting houses" that do pretty much nothing but console ports.

Well that doesn't mean much without any context.
Sure if there are porting houses that have build an assembly line of making ports with the least amount of effort it is going to be much cheaper than having to reduce production on a new title by having your own coders deal with the porting, you know people that would actually care about the game and would want it to be good.

Also is this a recent comment he made? About recent going ons?
Because at least one big reason the consoles switched to x86 was that it would make it much easier to port games to and fro.
https://gamingbolt.com/ps4s-api-is-...from-the-nightmares-of-porting-to-the-ps3-dev

News Intel Raptor Lake Specs Allegedly Exposed: Up To 24 Cores, 32 Threads

Splendid

Splendid

Admirable

Admirable

Polypheme

Polypheme

Polypheme

Polypheme

Distinguished

Splendid

Splendid

Distinguished

Polypheme

Polypheme

Polypheme

Splendid

Polypheme

Distinguished

Polypheme

Polypheme

Polypheme

Distinguished

Polypheme

Polypheme

Polypheme

Share this page