News Intel's Core i7-14700K Benchmarked: More Cores, Higher Clocks

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

bit_user

Titan
Ambassador
You gonna be mad again for saying it but you again didn't even look at the picture or the video.
Again, it's not true!

How else would I know the guy was Technical Marketing, if I didn't watch the video? I watched it the whole way through, just to be sure I didn't miss anything!

The only reason someone would be mad is if you make baseless accusations about them, so how about don't? If you think someone missed something, then just ask if they missed that part, instead of starting with the accusations.

If you are doing CPU rendering but have to keep your normal workflow running alongside then even if you manage priorities, which nobody would do for every single time you change from one app to the other, you would lose either performance or respositivity.
Again, this is really a reflection of poor thread scheduling on the part of Windows 11. If the backgrounded program merely runs at a lower priority, then it should have minimal impact on system responsiveness or the performance of higher-priority foreground tasks.

This is about putting your CPU rendering in the background ON PURPOSE so it will only run on the e-cores so that you can continue doing your game development or video editing or whatever it is you are doing in the foreground, the CPU rendering will take a while longer but won't cut down on your workflow.
I have firsthand experience of this, on Linux. At my job, we'd run long, all-core test runs. Once I started running them at nice 8, they still finished in about the same amount of time, but the system felt about as responsive as if it were idle. So, the problem, here, is apparently Windows' scheduler.

And while that's something I had to do explicitly, if Windows is adjusting priorities based one foreground and background focus, then all they should have to do is tune the impact that has on thread priorities.

You choose your poison. Either hybrid or you have two systems one for working and a render box or your nerves are going to die trying to play thread director instead of focusing on your work.
Option 3: just run Linux.

But only if there is limited space and/or a limit on acceptable cost.
What I said is true for a given or even slightly higher cost (see the 12 P + 0 E data).

If you can just throw money at them, a lot of problems go away. However, last I checked, Intel was a for-profit business. So, they need to protect their margins or shareholders will revolt and toss the executives out on the street. That makes your suggestion of significantly more P-cores a non-starter. They are cost-constrained and power-constrained. The additional of E-cores addresses both.
 
Again, it's not true!

How else would I know the guy was Technical Marketing, if I didn't watch the video? I watched it the whole way through, just to be sure I didn't miss anything!

The only reason someone would be mad is if you make baseless accusations about them, so how about don't? If you think someone missed something, then just ask if they missed that part, instead of starting with the accusations.
Ok, did you miss the part were they minimize the unreal development render to play the unreal game (for bug testing I'm sure) ?
Again, this is really a reflection of poor thread scheduling on the part of Windows 11. If the backgrounded program merely runs at a lower priority, then it should have minimal impact on system responsiveness or the performance of higher-priority foreground tasks.
No it's not!
The only way to have no effect at all on your workflow would be to put the render on idle and the foreground on real-time, which would stop your render ice cold.
We are not talking about just working windows here but a second program that can also use up all available cores.
Option 3: just run Linux.
Same as above.
What I said is true for a given or even slightly higher cost (see the 12 P + 0 E data).

If you can just throw money at them, a lot of problems go away. However, last I checked, Intel was a for-profit business. So, they need to protect their margins or shareholders will revolt and toss the executives out on the street. That makes your suggestion of significantly more P-cores a non-starter.
That is called cost-efficiency not power-efficiency.
They are cost-constrained and power-constrained.
No, they are only cost and/or size constrained, and that only in theory because we have no idea what acceptable sizes or costs are for a final CPU.
The same amount of power the 12900 uses would give us more performance if all 16 cores were p-cores, so there is no issue with power being constrained.
 

bit_user

Titan
Ambassador
That is called cost-efficiency not power-efficiency.
If your proposed solution is too expensive, then it's not a solution. Their solution space is constrained both by cost and power.

we have no idea what acceptable sizes or costs are for a final CPU.
Not knowing their precise costs doesn't make the issue go away. Furthermore, it doesn't let us disregard the undeniable fact that a bigger die costs more. For a business, you win by maximizing margins, which necessarily means controlling costs.

The hybrid solution enabled Intel to provide more absolute performance for the same area as a 10P + 0E would've. Furthermore, it provides more performance at a given power level than the 10P + 0E. That qualifies it as both a win on space- and power- efficiency!
 
Last edited:
That's what I thought, but then I had to edit an Excel worksheet with a couple thousand rows and was amazed at how it bogged down. No fancy formulas, even. Of course, it's only bogged when I'm doing something in it.

Same for MS Word. I haven't edited huge docs recently, but when you get into editing multi-hundred page technical documents, it can get slow at points.

Most everything you were talking about is heavily single thread limited and more IPC / clockspeed is what makes it faster not more cores. As for Intel E cores, they are heavily cut down from the normal P cores, especially in supporting silicon, take a look at a die shot and you'll see what I mean. Things like ALU/AGU don't take up much space, the cache and cache supporting elements are what gobble die space but those are necessary to prevent stalling, which is where the core spends cycles doing nothing waiting on the I/O subsystem. The higher the clock speed the better you need to be to feed it data. This is also where memory latency, measured in nanoseconds, plays a big deal.

This is actually what was wrong with the old Net Burst architecture, they cranked up the speed and deepened the pipeline but the cache and memory system could not keep up and it spent too much time stalling out. The existing Pentium III design was simply better at keeping the machine fed so Intel eventually dumped Net Burst and created a new uArch from the Tualatin / Pentium-M design and called it Core.

What I'm pointing out is that you can get to 60~80% peak performance without needing those heavy instruction caches / optimizations, but then you slam into a brick wall. You need to then heavily invest power / silicon budget into branch prediction, loop unrolling, deep caches and instruction reordering to carry that final 20~40%. That is the difference between a P core and an E Core.
 

bit_user

Titan
Ambassador
What I'm pointing out is that you can get to 60~80% peak performance without needing those heavy instruction caches / optimizations, but then you slam into a brick wall.
As the article says, its instruction cache is first-rate:
"With more common instruction lengths, Gracemont maintains a steady 5 instructions per cycle throughout the entire 64 KB L1 instruction cache. For test sizes between 32 KB and 64 KB, it even beats Golden Cove and Zen 3. This larger L1i a welcome improvement over the 32 KB designs we’ve seen in recent x86 big cores."​

You need to then heavily invest power / silicon budget into branch prediction,
Some key quotes about its branch performance:
"Gracemont’s direction predictor looks surprisingly similar to Golden Cove’s, except for a gentle increase in time per branch after the pattern length exceeds 1K which points to an overriding predictor, like the one in Zen 2. ... Gracemont’s predictor is better than Skylake’s, so 'Core class branch prediction' and 'deep branch history' are both accurate descriptions."​
"Gracemont’s 5K entry BTB may seem small next to Golden Cove’s jaw dropping 12K branch target cache but Golden Cove is an exception. 5K entries is right in line with Sunny Cove and Zen 3’s BTBs, while being larger than the 4K BTBs in Skylake and before. And size isn’t everything – speed matters too. Like Zen 3, Gracemont can do zero bubble predictions with up to 1024 branches however unlike Zen 3, Gracemont’s L2 BTB adds two cycles of latency instead of three."​
"On the integer side, Gracemont has impressive resources. 77% of in-flight instructions can write to integer registers – a ratio that compares favorably to that of recent big cores from Intel/AMD. Reordering capacity for branches is insane. Almost half of the instruction stream can be taken branches."​

loop unrolling, deep caches and instruction reordering to carry that final 20~40%. That is the difference between a P core and an E Core.
Here's the article's conclusion:
Gracemont’s strengths:​
  • Well balanced out of order execution resources
  • Excellent branching performance (high reordering capacity, large zero bubble BTB, good prediction capabilities)
  • Large 64 KB L1 instruction cache
  • Low latency L1 data cache
  • Modest power consumption
Weaknesses:​
  • L3 access suffers from high latency and low bandwidth
  • Lower vector throughput than other desktop x86 cores

You'd probably enjoy the article. Sounds like it might challenge some of your assumptions.

 
You'd probably enjoy the article. Sounds like it might challenge some of your assumptions.

These aren't assumptions, actual benchmarking refutes that articles statements, though Intel really didn't want people to do that.

With process lasso it's really easy to force process's to work on specific cores, thus it's easy to test 4 P-cores vs 4 E-cores and see the actual performance result, it's not pretty. E cores substantially underperform P cores when doing like for like tasks, but use a fraction of the power doing so. Look at a die shot that someone's marked and see how little space the E-cores take up vs the P-cores and their supporting infrastructure.

Or to put another way, if anything in that article or what you said was actually true, there would be 0 "
P" cores and just 24+ of these amazingly powerful processing cores going 5+ ghz.
 

bit_user

Titan
Ambassador
These aren't assumptions, actual benchmarking refutes that articles statements, though Intel really didn't want people to do that.
Pretty much everything in that article is based on their own microbenchmarks, which you can see here:

You can also reach out to them on Discord, although I think access to the group requires a Patreon membership. I think anyone can leave a comment on their articles, themselves.

If you're aware of benchmarks or information which contradicts their data or conclusions, please share them! So far, ChipsAndCheese is providing the best in-depth analysis I've seen. If there's anywhere as good or better, I'd sure like to know about it!

With process lasso it's really easy to force process's to work on specific cores, thus it's easy to test 4 P-cores vs 4 E-cores and see the actual performance result, it's not pretty.
It's one thing to look at & discuss the E-cores, in the abstract. It's another thing to look at how how they perform as part of Alder Lake, and that's where we tend to see bottlenecks. They're disadvantaged in their access to L3 cache (as mentioned in the article, and my quote thereof), and they also face issues with L2 cache contention that the P-cores and their exclusive L2 caches don't.

E cores substantially underperform P cores when doing like for like tasks, but use a fraction of the power doing so.
The empirical data I've seen on E-cores suggests their overall comparison vs. P-cores in Alder lake is:

Test
int​
float​
Single-Threaded
64.5%​
54.1%​
One Thread per Core
55.8%​
52.6%​
All Threads per Core
47.5%​
51.5%​

Of course, those are just averages. Depending on the task, number of threads, and the size of its working set, some things will perform better and others will perform worse.

Or to put another way, if anything in that article or what you said was actually true, there would be 0 "
P" cores and just 24+ of these amazingly powerful processing cores going 5+ ghz.
I don't think that's an accurate characterization of the article. At numerous points, they show where it's narrower or shallower than Golden Cove. Where they really draw a parity (and this should come as no surprise) is with Skylake. It also sometimes compares favorably to Zen 2 and even Zen 3.

My "take" on the E-cores is that they:
  • Are narrower, especially in the FPU area.
  • Aren't designed to clock as high, which also makes it easier for them to achieve a good IPC level.
  • Have shared L2 (and less per core) and shared ring bus access port.

All of which translates into the performance discrepancies noted above. None of it contradicts that article's conclusions, either.
 
I swear you like to argue in every thread just to be contrarian. If someone said the sky was blue, would you argue that it wasn't really blue but a shade of azure, and that azure makes it totally not blue.

What I'm pointing out is that you can get to 60~80% peak performance without needing those heavy instruction caches / optimizations, but then you slam into a brick wall. You need to then heavily invest power / silicon budget into branch prediction, loop unrolling, deep caches and instruction reordering to carry that final 20~40%. That is the difference between a P core and an E Core.
 

bit_user

Titan
Ambassador
I swear you like to argue in every thread just to be contrarian. If someone said the sky was blue, would you argue that it wasn't really blue but a shade of azure, and that azure makes it totally not blue.
I'm not really sure why you say that, because:
  • I never argue a point I don't sincerely believe is or could be true.
  • I always try to check & cite the best sources I can find.
  • I always invite others to share their sources, and try to look at them with an open mind.

On occasion, where I can see both sides of an issue, I might take a counterpoint. This is as much to make up my own mind about a subject as to help round out the discussion. That's not something I do very often, because I usually do form a clear stance on most topics, after I've acquired enough information. The only time I recall doing that in a discussion with you was around dividends, or something like that.

Another example that comes to mind is around GPU pricing, where I'm simply trying to understand the current pricing phenomenon rather than defending it. I dislike shallow and simplistic explanations like "corporate greed", so I try to see if we can get any deeper insight into the matter and whether it seems to hold up.

My reason for participating in these forums is to learn and share knowledge. I tend to learn more in debates than echo chambers, as they challenge me to check my own assumptions and expose me to new information. I feel strongly about correcting misinformation being spread, whether knowingly or not. I certainly don't seek conflict, but I don't shy away from it, either.

So, I invite you to share your best sources & information on the topic of E-core performance and utilization. I promise I will give them fair consideration. But, I'm a "details" person, so I will point out anything that's inconsistent with other sources on the matter or doesn't seem to add up.


BTW, on this very subject, I've probably spent about a day, over the past week, hashing out various details with Terry and @thestryker , in the other thread I linked, plus scraping Chips & Cheese's data and building my clockspeed algorithm, to produce the estimated performance graphs I posted on the previous page. And, believe me, I didn't do all of that out of boredom! So, I definitely have made a concerted investment not only to learn about the subject, but also to test my own assumptions and share what I've found. #originalresearch
 
Last edited:
If your proposed solution is too expensive, then it's not a solution. Their solution space is constrained both by cost and power.
If your only problems are size and/or cost but not power then power is the only thing that doesn't need a solution.

If intel kept the "solution space" to a small size to make more profit then it's PURELY a matter of profit (cost) .
 
Hmm I think they are going a little to nuts with the crappy "e-cores". Like having four of those around for web browsing / watching media or just generally putzing around the desktop is fine. Stuffing them in there to inflate MC benchmarks is kinda dumb. Like nobody, and I mean nobody, is going to actually be using 12 e-cores, heck 8 is pushing it.
I'm curious what your alternative is? Intel to just not compete with AMD on multithreaded workloads for desktop? Go back in time and make AMD never release 12/16 core desktop parts while also killing HEDT?

The reality is that Intel couldn't afford to just let AMD keep winning the multithreaded game. HEDT is absolutely dead and only workstation exists which carries a significant price premium when compared to what HEDT was. Now we're left with the middle ground which is where the 12 and 16 core AMD parts come in and while it's still bad because no extra memory bandwidth or PCIe it does bring the CPU power. Intel's chosen path for competing was to leverage an Atom class core designed for parallel workloads rather than massively increasing die size. This gave them the best of both worlds: comparative performance and more die per wafer than going equal performing p-cores (then with RPL a big bump on efficiency).
What I'm pointing out is that you can get to 60~80% peak performance without needing those heavy instruction caches / optimizations, but then you slam into a brick wall. You need to then heavily invest power / silicon budget into branch prediction, loop unrolling, deep caches and instruction reordering to carry that final 20~40%. That is the difference between a P core and an E Core.
This isn't a particularly accurate statement (maybe just gross oversimplification?) when applied to the difference in design between p-cores and e-cores for Intel. The sacrifices made were at the altar of parallelization so the micro-op cache was dropped, shared cache features fewer cycles (limits peak clocks) to help mitigate the problematic nature of shared L2 (which saves space), FP/Vector units are good but only 128bit, smaller buffer sizes, and tons of power saving optimizations.

Long story short they cut out everything they could to save space and optimize for power efficiency, but they kept the core of what makes a performant part like the branch prediction and int/fp/vector capability. Most of what was left on the table is capacity related which does directly impact peak performance. This is why 8 e-cores can still deliver 55-60% of the performance of 8 p-cores (with HT) that are clocked a third higher in highly parallel workloads. When you compare single or lightly threaded workloads though they get massively outclassed because it cannot leverage the core design.
 
  • Like
Reactions: bit_user
I'm curious what your alternative is? Intel to just not compete with AMD on multithreaded workloads for desktop? Go back in time and make AMD never release 12/16 core desktop parts while also killing HEDT?

The reality is that Intel couldn't afford to just let AMD keep winning the multithreaded game. HEDT is absolutely dead and only workstation exists which carries a significant price premium when compared to what HEDT was. Now we're left with the middle ground which is where the 12 and 16 core AMD parts come in and while it's still bad because no extra memory bandwidth or PCIe it does bring the CPU power. Intel's chosen path for competing was to leverage an Atom class core designed for parallel workloads rather than massively increasing die size. This gave them the best of both worlds: comparative performance and more die per wafer than going equal performing p-cores (then with RPL a big bump on efficiency).

This isn't a particularly accurate statement (maybe just gross oversimplification?) when applied to the difference in design between p-cores and e-cores for Intel. The sacrifices made were at the altar of parallelization so the micro-op cache was dropped, shared cache features fewer cycles (limits peak clocks) to help mitigate the problematic nature of shared L2 (which saves space), FP/Vector units are good but only 128bit, smaller buffer sizes, and tons of power saving optimizations.

Long story short they cut out everything they could to save space and optimize for power efficiency, but they kept the core of what makes a performant part like the branch prediction and int/fp/vector capability. Most of what was left on the table is capacity related which does directly impact peak performance. This is why 8 e-cores can still deliver 55-60% of the performance of 8 p-cores (with HT) that are clocked a third higher in highly parallel workloads. When you compare single or lightly threaded workloads though they get massively outclassed because it cannot leverage the core design.

Intel needs to move on to another process node, AMD is kicking their buts due to them having access to TSMC's more advanced fabrication process. All Intel did was bolt on four more cheap e-cores that will never be used because there is no use case for them. There is zero difference between having an 8 core CPU and a 256 core CPU if the user never actually use's more then 8 cores. This is what I think everyone is missing, they just assume MOAR CORE = better, when that its simply not true. All they did was waste silicon. As for my statement, it's precisely true and proven with benchmarks. Having faster / wider processing resources doesn't matter if you can't keep them fed with instructions, and to do that you end up needing ton of additional support which quickly bloats the transistor count and therefor power budget. Cutting those components down allows for lower power utilization at the expense of top end performance. Or in other words, they could add another set of ALU/AGU/FPUs and HT to the "E Cores" and you wouldn't get any more performance because they lack the supporting infrastructure to keep them fed.

This is the kind of size difference we are talking about.



(resized it below cause that is a massive jpg)

TkRlvUi.jpg



Couldn't' get a better resolution of the individual core components, the decode and load store units alone are almost the size of an entire Gracemont core and are larger then the actual Integer Execution Unit (ALU/AGU/MMU) which does most of the raw processing.

us33eqO.png
 
Last edited:

bit_user

Titan
Ambassador
All Intel did was bolt on four more cheap e-cores that will never be used because there is no use case for them. There is zero difference between having an 8 core CPU and a 256 core CPU if the user never actually use's more then 8 cores.
At work, our software development machines have i9-12900, and we love the extra cores for doing speeding up our software builds, even while we have a couple VMs running tests in the background. Our buildsystem uses CMake and Ninja, and has no problem saturating all the cores/threads available to it.

Or in other words, they could add another set of ALU/AGU/FPUs and HT to the "E Cores" and you wouldn't get any more performance because they lack the supporting infrastructure to keep them fed.
Chips & Cheese thought they were pretty well-balanced, though. Since they're not as wide as the P-cores, they don't need to be as "deep".

This is the kind of size difference we are talking about.
I think we've all seen the die shots.

Here's a fun fact: as best I can ascertain, Gracemont (without L2) has 93.3% as many transistors as a Skylake core (with L2)! So, why should it be surprising that Gracemont is comparable in IPC to Skylake?

Couldn't' get a better resolution of the individual core components,
Here's the best breakdown of Golden Cove that I've seen:

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F43c75986-a848-46de-b1b9-695fccb7eea4_1044x864.jpeg


https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fae57d01d-12a5-4bbb-84d1-61d5275bf970_1185x636.jpeg

Source: https://locuza.substack.com/p/die-walkthrough-alder-lake-sp-and
(It has an annoying floating dialog, but you can clear it if you scroll down and then back up)

Gracemont doesn't need to be as deep, because it's not as wide. Because it doesn't clock as high, they could use fewer pipeline stages, as well. Those are aspects which helped save some area.

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0144c90-e6cf-4a59-9568-727a0056c44c_1244x1080.jpeg


So, that gives us the following size comparisons:

CoreSize Relative to Golden Cove (sans L2)
Gracemont
30.4%​
Zen 2
57.5%​
Zen 3
60.5%​

And comparing Gracemont to AMD:

CoreSize Relative to Gracemont (sans L2)
Zen 2
189.6%​
Zen 3
198.7%​

So, a Gracemont core actually about half the area of Zen 3 and nearly 1/3rd the area of Golden Cove (excluding L2, in both cases). That's small, but not tiny.

IMO, what's more striking, especially when comparing to a core like Skylake, isn't so much how small Gracemont is, but rather how huge Golden Cove is! Intel was really swinging for the fences, when they designed that core!
 
Last edited:
IMO, what's more striking, especially when comparing to a core like Skylake, isn't so much how small Gracemont is, but rather how huge Golden Cove is! Intel was really swinging for the fences, when they designed that core!
As I've said multiple times and provided support for, you can reach 60~80% of peek performance without needing heavy infrastructure, but to reach the top of potential performance you need to invest heavily into supporting infrastructure. The Golden / Raptor Cove decoder and load/store units alone are almost big as the entire Gracemont core, those are just some of the components used to handle data for the core. The processing actually happens on the Integer Execution Unit with the FPU stepping in for certain instructions.

Anyhow my point has been made even though you insist on being contrarian.
 

bit_user

Titan
Ambassador
Anyhow my point has been made even though you insist on being contrarian.
I don't know how I'm being contrarian, exactly. Some of the images and data I posted even support your points. I thought you might appreciate having better dieshot breakdowns, for instance.

The main issue I was trying to address was the notion that Gracemont is too small and simple to offer good IPC. All I've tried to show is that it's surprisingly sophisticated and refined, in spite of its size.

I've repeatedly acknowledged that it's not as deep, wide, or performant as Golden Cove. However, in historical terms it's also not small. The main reason it looks so small is that it's being compared to Golden Cove, which is huge.

Something else that might not have gotten much consideration is this comparison I posted of the SPEC2017 results on Gracemont vs. Golden Cove:


Test
int​
float​
Single-Threaded
64.5%​
54.1%​
One Thread per Core
55.8%​
52.6%​
All Threads per Core
47.5%​
51.5%​

Source: https://www.anandtech.com/show/1704...w-hybrid-performance-brings-hybrid-complexity

If you think about what it's telling us, one significant point is that Gracemont isn't scaling up to 8 threads as well as Golden Cove, especially in integer workloads. I think that's largely down to the shared L2 and ring bus bottlenecks. If not for those, it might get a bit more respect. Particularly when you consider how much more performance 8 Gracemont cores add than simply enabling Hyper Threading on the P-cores.
 
Last edited:
Intel needs to move on to another process node, AMD is kicking their buts due to them having access to TSMC's more advanced fabrication process. All Intel did was bolt on four more cheap e-cores that will never be used because there is no use case for them. There is zero difference between having an 8 core CPU and a 256 core CPU if the user never actually use's more then 8 cores. This is what I think everyone is missing, they just assume MOAR CORE = better, when that its simply not true. All they did was waste silicon.
What are you talking even talking about here? So do you think AMD's 12 and 16 core parts also just shouldn't exist? Nobody is allowed to have workloads that use over 8 cores? This honestly just sounds like you're either being willfully ignorant or arguing for the sake of arguing.
 
Intel needs to move on to another process node, AMD is kicking their buts due to them having access to TSMC's more advanced fabrication process.
The only thing a new node provides is to make the area of the core smaller which increases the heat per area and/or allow for more transistors in the same area which also increases the heat per area, which is why AMD needs the same liquid cooling to just reach the advertised numbers that intel needs to go 30% above their advertised numbers with amd still being hotter.
 

bit_user

Titan
Ambassador
The only thing a new node provides is to make the area of the core smaller which increases the heat per area and/or allow for more transistors in the same area
This is not accurate. In general, smaller nodes also deliver power savings at the same frequency, or frequency improvements at the same power.

Here's how TSMC characterized several of their process nodes, back in 2019:

16FF+ vs 20SOC10FF vs 16FF+7FF vs 16FF+7FF vs 10FF7FF+ vs 7FF5FF vs 7FF
Power
60%​
40%​
60%​
<40%​
10%​
20%​
Performance
40%​
20%​
30%​
?​
same (?)​
15%​
Area Reduction
none​
>50%​
70%​
>37%​
~17%​
45%​
Source: https://www.anandtech.com/show/1417...chnology-pdk-drm-eda-tools-3rd-party-ip-ready

As noted several times before, this is not accurate.

First, you generalize across all AMD processors, when you really mean just the Ryzen 9 7950X. Second, TechPowerUp showed air cooling is adequate to unleash the processor's potential across a wide diversity of workloads.
That's not even using Noctua's best air cooler.
 
  • Like
Reactions: thestryker
As noted several times before, this is not accurate.
The heat in a smaller area part is accurate, but even so that's not why AMD's Zen 4 parts are hard to cool on desktop it's the bad IHS.
First, you generalize across all AMD processors, when you really mean just the Ryzen 9 7950X. Second, TechPowerUp showed air cooling is adequate to unleash the processor's potential across a wide diversity of workloads.
fan-scaling-noctua.png
That's not even using Noctua's best air cooler.
AMD is a lot less transparent about boosting behavior than Intel is which makes these comparisons a lot harder. I think the TDP on the 7950X is 170W with 230W max socket power. The only cooler that actually hits these numbers is the AIO in the TPU testing (Blender is the only workload they use which maxes CPU). Of course I firmly believe without the junk IHS the U14S would keep up in every test.

That doesn't change the fact that pushing these power levels through smaller chips is a huge forthcoming problem.
 

bit_user

Titan
Ambassador
The heat in a smaller area part is accurate,
It's strictly true, only if you simply fab the same design on a smaller node and run the same or more power through it.

Otherwise, there are other factors that could come into play. Techniques like more aggressive clock-gating, and perhaps even layout-level and material choices could mitigate increased thermal density.

The only cooler that actually hits these numbers is the AIO in the TPU testing (Blender is the only workload they use which maxes CPU). Of course I firmly believe without the junk IHS the U14S would keep up in every test.
Even as is, the NH-U14S @ 100% fan speed gets within a couple %, on the most demanding workloads (Blender & Cinebench nT).

That doesn't change the fact that pushing these power levels through smaller chips is a huge forthcoming problem.
The power levels are, themselves, somewhat of a problem. At least the 7950X doesn't need full power to deliver nearly all of its performance.
 
This is not accurate. In general, smaller nodes also deliver power savings at the same frequency, or frequency improvements at the same power.

Here's how TSMC characterized several of their process nodes, back in 2019:
Yes you can make the exact same chip with zero improvement in performance or even in clocks...now who's suggesting products that are not viable in the market?!
As noted several times before, this is not accurate.

First, you generalize across all AMD processors, when you really mean just the Ryzen 9 7950X. Second, TechPowerUp showed air cooling is adequate to unleash the processor's potential across a wide diversity of workloads.
That's not even using Noctua's best air cooler.
I do not generalize more than you, or anybody else that I'm answering to, do.
Your only data is about the 7950x as well so why am I at fault here and not you to the same degree?
Also your link doesn't disprove anything I said, as you realised yourself, without the liquid cooling you already lose a couple of % of the advertised performance, not the 30% higher power draw performance, the advertised one.
Even as is, the NH-U14S @ 100% fan speed gets within a couple %, on the most demanding workloads (Blender & Cinebench nT).


The heat in a smaller area part is accurate, but even so that's not why AMD's Zen 4 parts are hard to cool on desktop it's the bad IHS.
Might be so, but more compact heat is still more difficult to cool.
The lower we go in process nodes the denser the heat is going to be.
There is no benefit for the consumer with smaller nodes except if your argument is that CPUs are running out of space they could use.
 

bit_user

Titan
Ambassador
I do not generalize more than you, or anybody else that I'm answering to, do.
Your only data is about the 7950x as well so why am I at fault here and not you to the same degree?
In your text, you said "... AMD needs the same liquid cooling ...", but your link was to data only for 7950X. So, I was doing two things:
  1. Pointing out the mismatch between your words and data.
  2. Responding to the point you were apparently just making about the 7950X, by providing further data about the 7950X.

Although, what's funny is that nothing in that link directly supports your claim.

Also your link doesn't disprove anything I said, as you realised yourself, without the liquid cooling you already lose a couple of % of the advertised performance, not the 30% higher power draw performance, the advertised one.
They note that the Blender benchmark was measured to use up to 235 W, which is just over the 230 W PPT I assume you're referencing. That's where it loses a couple %. The average performance for the Noctua solution is 0.2% less than AIO. That's not significant.

Somebody spending all their time doing CPU rendering with Blender or Cinema 4D, on a 7950X, might decide to invest in an AIO, but for everyone else it makes essentially no difference vs. air coolers of the NH-U14S' caliber or better.

There is no benefit for the consumer with smaller nodes except if your argument is that CPUs are running out of space they could use.
I already showed this statement is inaccurate (see table). TSMC claims their 5 nm node is 20% more efficient than their 7 nm node, at ISO-frequency, and offers 15% more performance at ISO-power. I know you saw the post, because your reply quoted it.
 
Last edited:
  • Like
Reactions: thestryker
... wow this is still going.

Node size is really just the size of the transistor, the smaller the transistor the less electrical resistance it has and therefor the less power gets turned into waste thermal energy every time it switches. Less waste heat means you can do the exact same work for less power, or do more work for the same power consumption. A 65W CPU at 7nm will do more work then a 65W CPU at 12nm. Various uArch's can get more or less performance out of those limits, but ultimately everything comes down to how much thermal energy you can deal with and Intel has slammed into the wall of how much performance it can squeeze out of it's current transistor node.

This is basic stuff Toms went over back in the early 2000's comparing old AMD and Intel CPUs.

As for why Intel is so far behind TSMC / Samsung, many years ago Intel decided against pursuing Extreme UltraViolet Lithography (EUVL) as it was deemed too difficult. TSMC on the other hand embraced it full force and poured a ton of resources into making it viable, Samsung followed suit. Intel has since realized it's mistake and is sprinting hard to play catchup.

I remember back when this was the big talk of the community, team blue fanbois were insisting EUVL was a pipedream and Intel was super wise to not waste it's time and energy going after it and instead focusing on miniaturizing it's existing fabrication processes.
 
  • Like
Reactions: bit_user
Status
Not open for further replies.