News Intel's Core i9-14900K Shows Increased Multi-Threaded Performance in Leaked Benchmarks

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Sep 5, 2023
10
3
15
Wrong!


OMG? Using what kind of math?

Here's the relative speedup between the i9-14900K and the others:

Performance of Intel's Core i9 in CPU-Z Built-In Benchmark​


Single Thread (vs. i9-14900K)Multi Thread (vs i9-14900K)
Core i9-14900K100.00%100.00%
Core i9-13900KS96.68%94.91%
Core i9-13900K100.00%94.75%


On single-threaded performance, it managed a 3.4% win vs. the i9-13900KS, which exactly corresponds to the difference you'd expect between 5.8 GHz and 6.0 GHz (presumably, that i9-13900K was overclocked). On multi-threaded, it only manages a bit over 5%, which sounds like what I'd expect from the rumored bump in PL2.

However, where does the author get 10% ??? Seems very sloppy, to me. We need answers!
The authors article is correct~ish, the author's chart is incorrect. And that's the problem.

You're doing the wrong math, because you can't be bothered to look at the actual images in the article which show the correct numbers. If you look at those you get the 14900K with 978 single threaded performance (ST), the 13900k with 901.8 ST, and the 13900KS with 945.5 ST. This gives 8.5% increase in single threaded performance for the 14900k over the 13900k and a 3.5% ST over the 13900KS.
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
They could have tried 3D cache with their 14th gen and priced them accordingly, missed opportunity by intel...
3D cache is not a trivial undertaking. AMD specifically designed Zen 3 to accommodate it, from the very beginning, and then look how long it took them to actually release the 5000X3D versions! The design changes needed to support stacked L3 cache go well beyond what would normally be considered a "refresh".
 
  • Like
Reactions: Lucky_SLS

bit_user

Titan
Ambassador
The authors article is correct~ish, the author's chart is incorrect. And that's the problem.
As I said, in a follow-up reply, it the inconsistency that bothers me. As for the raw numbers, I mean who the heck knows what those are actually measuring! It could be some weird overclocking setup or something. That's why I said I'm holding out for the actual reviews and benchmarks performed in a controlled setting.

You're doing the wrong math, because you can't be bothered to look at the actual images in the article which show the correct numbers.
You're right. It's because I did not see the arrow. I did look at that CPU-Z screenshot long enough to know that it didn't have any benchmark results in it, and yet I still did not see the arrow. I guess they should make it higher-contrast or something.

Please don't accuse me of "can't be bothered". If I couldn't be bothered to look at an image, I certainly wouldn't have bothered to compute the relative speedup from the data presented in the table. I normally look through all the images in articles, even when it's a big slide deck from AMD or Intel. I simply didn't see that there was more than one.

Thanks for taking the time to point out what I (and multiple others!) missed. IMO, that still doesn't let the author off the hook.
 
3D cache is not a trivial undertaking. AMD specifically designed Zen 3 to accommodate it, from the very beginning, and then look how long it took them to actually release the 5000X3D versions! The design changes needed to support stacked L3 cache go well beyond what would normally be considered a "refresh".

from lithography standpoint, they already have foveros tech for 3d stacking.

its about designing or tweaking the architecture for utilize the stacked cache like u said i guess. I am not an expert so i dont know much work goes in even in the tweaking part but with all the stuff intel showcases and announces, i thought it was definitely doable for Intel...

But what i wanted to say was that Intel would have benefitted much to do some work and release 3D cache line up instead of just a refresh...
 
from lithography standpoint, they already have foveros tech for 3d stacking.

its about designing or tweaking the architecture for utilize the stacked cache like u said i guess. I am not an expert so i dont know much work goes in even in the tweaking part but with all the stuff intel showcases and announces, i thought it was definitely doable for Intel...

But what i wanted to say was that Intel would have benefitted much to do some work and release 3D cache line up instead of just a refresh...
Isn't cache transparent to the system/cpu?
It's the software that will either be able to use more cache, or not.
That's why x3d only does anything for some of the software, right?
 
Sep 5, 2023
10
3
15
As I said, in a follow-up reply, it the inconsistency that bothers me. As for the raw numbers, I mean who the heck knows what those are actually measuring! It could be some weird overclocking setup or something. That's why I said I'm holding out for the actual reviews and benchmarks performed in a controlled setting.


You're right. It's because I did not see the arrow. I did look at that CPU-Z screenshot long enough to know that it didn't have any benchmark results in it, and yet I still did not see the arrow. I guess they should make it higher-contrast or something.

Please don't accuse me of "can't be bothered". If I couldn't be bothered to look at an image, I certainly wouldn't have bothered to compute the relative speedup from the data presented in the table. I normally look through all the images in articles, even when it's a big slide deck from AMD or Intel. I simply didn't see that there was more than one.

Thanks for taking the time to point out what I (and multiple others!) missed. IMO, that still doesn't let the author off the hook.
That's fine. I will admit it just erked me that it seemed a rush to jump to a pre-conceived conclusion.

The possibility was that the article was off and/or the chart was off. (Because something was very obviously off.) It bothered me that most people just took the chart as the thing that was right and the article was the part that was off. And it seemed that a large part of that was because the chart error confirmed a lot of preconceived notions about the CPUs. That just didn't seem like doing due diligence and keeping an open mind to me, or very fair to the author. I will say the arrow and 1 of 5 images small header over the image space were really obvious to me, so I didn't take into account that other people will have different setups that may obscure them.

That said, I'm very much with you and waiting for proper benchmarks before reaching a conclusion. But I can also understand why the article was written given that it's a leak with 'evidence' that contradicts the prevailing narrative. That's news in and of itself. Especially given that while we know that there have not been and direct structure changes we do know that Raptor Lake Refresh will have features turned on that Raptor Lake did not as well as new masks and other process improvements as Intel moves it's Intel7 process over into making certain kinds of tiles for future designs before the machines in the line are retuned further or retired to make room for more EUV machines.

It wouldn't be entirely out of line for them to get ~10% additional performance out of doing that kind of thing, especially given that we've seen Intel process improvements themselves give performance improvements to the same cache designs. Which would be needed to keep the cores feed with another 200Mhz improvement in clock speeds given that the cache is what's holding back Raptor Lake and Alder Lake Cores right now.

That said, I'm really doubtful they did better than 6 or 7 percent ST performance increases if they can manage that in certain scenarios. But who knows. Maybe after years of bad luck Intel Foundry teams pulled a rabbit out of their hat this time. They are certainly due for something other than yet another cock up.
 
  • Like
Reactions: bit_user
Isn't cache transparent to the system/cpu?
It's the software that will either be able to use more cache, or not.
That's why x3d only does anything for some of the software, right?

the software does not decide what data needs to be stored in the 3d cache. the CPU specific scheduler decides which data needs to be moved to which cache lvl - L1, L2, L3 & 3d cache. its called prefetching and is controlled by the CPU. there is difference between caching and prefetching but here with 3d cache, i think it more of prefetching.

@bit_user considering intel came up with hybrid arch and specific schedulers, i think they would be able to implement 3d cache in on their own quickly and fine tune the schedulers.

software prefetch is also used but its used specifically and not used by all softwares.

So to answer your question, its not program software controlled. its CPU controlled. its just that not all programs are memory sensitive, games just happen to be so.

maybe my understanding is not perfect, others can explain this better...
 

bit_user

Titan
Ambassador
from lithography standpoint, they already have foveros tech for 3d stacking.
I didn't say they couldn't do it - just that it's more of a redesign than a refresh!

But what i wanted to say was that Intel would have benefitted much to do some work and release 3D cache line up instead of just a refresh...
No kidding. However, their plan was to release a desktop version of Meteor Lake. By the time they realized that wasn't a viable option, the ship had probably long since sailed on doing more than a refresh of Raptor Lake. It takes years to get a chip from the design phase to full production.

Isn't cache transparent to the system/cpu?
Cache is part of the CPU. So, no. It's not transparent, from a hardware perspective.
 
  • Like
Reactions: Lucky_SLS
the software does not decide what data needs to be stored in the 3d cache. the CPU specific scheduler decides which data needs to be moved to which cache lvl - L1, L2, L3 & 3d cache. its called prefetching and is controlled by the CPU. there is difference between caching and prefetching but here with 3d cache, i think it more of prefetching.
The CPU decides, but it can only store data that is there...
the software has to have enough data that needs to be, or has an benefit from being, stored in cache.
My question is if the cpu fills the cache up until the cache is full or not. (That's what I would call transparent, it doesn't care how much there is it just fills it until it gets a cache full reply)
If the CPU has a hardcoded limit on how much cache to use that would be pretty stupid and the only reason I could see why a larger cache wouldn't be used by a CPU.
 
the software has to have enough data that needs to be, or has an benefit from being, stored in cache.


Hence there being benefit for most games and little for productivity apps. Probably due to too much data? I don't know.

I have no idea on how the data is prioritized on which cache. That is the job of the scheduler.

Like I said, I am not the expert :(
 
  • Like
Reactions: bit_user

bit_user

Titan
Ambassador
The CPU decides, but it can only store data that is there...
the software has to have enough data that needs to be, or has an benefit from being, stored in cache.
Yes. The benefit of more cache is software-dependent. It has to do with how much data the software is accessing (sometimes referred to as it's "working set") and the access patterns. Modern CPUs also have hardware prefetchers, which try to detect access patterns and pull data into the cache hierarchy, before a thread even requests it.

My question is if the cpu fills the cache up until the cache is full or not. (That's what I would call transparent, it doesn't care how much there is it just fills it until it gets a cache full reply)
Cache is usually full. What happens is that you request something new, and if the cache doesn't contain it, an algorithm in the cache hardware selects a cache line to be "evicted", in order to make room for the line that's about to get fetched.

If the victim line is "dirty" (i.e. has been modified, while it's been resident in that block of cache), then it first needs to get written out. Otherwise, it can simply be overwritten. Whether or not a non-dirty cache line is only overwritten depends on whether the next level of cache is a victim cache or not. If the next level is a victim cache, then it's still transferred to that next level, even if not dirty.

That's just your first taste of the wonderful complexity of caches and cache algorithms. CPU caches and cache hierarchies are intrinsic to CPU scalability and performance.

If the CPU has a hardcoded limit on how much cache to use that would be pretty stupid and the only reason I could see why a larger cache wouldn't be used by a CPU.
I'm not saying you can't always add more cache. However, bigger caches usually involve tradeoffs. The process of determining whether requested data is resident in cache involves checking all of the places it might be held. This is known as the cache's associativity. You'll typically see caches with anywhere from 4-way to about 16-way associativity. It means that every time a lookup occurs, that many entries of "tag RAM" have to be fetched & tested, to see if they have the data.

Anyway, to make a cache bigger, you must either increase its associativity or increase the number of sets. Either way, you're probably looking at increasing latency. So, the size vs. latency is part of the calculation that goes into deciding how big to make a cache, and the CPU core's target clock speed also factors into that latency calculation.

As for 3D cache, I've read enough about the structures and design elements in the Zen 3 and Zen 4 cores to enable it, that it looks very much like something you need to design for, from day 1. Stacked cache isn't something you can just bolt on, as an afterthought.

A corollary of this point is that eliminating support for the 3D cache is one of the more prominent area optimizations which enabled AMD to achieve so much size reduction, in the Zen 4C cores. It's not just eliminating the TSVs to support the die stacking, but also the extra tag RAM and the physical distribution of the cache, in the base die.
 

bit_user

Titan
Ambassador
it seemed that a large part of that was because the chart error confirmed a lot of preconceived notions about the CPUs.
The chart didn't exactly align with "preconceptions", but understand that these performance expectations were based on previous leaks of performance data and design changes that seemed more authoritative than some random CPU-Z scores collected under unknown circumstances. Therefore, as we struggle to achieve a consistent idea of what to expect, the outlier claims of this article are immediately suspect. That the data wasn't clearly presented to us meant it was going to be called out - at least, by anyone who actually cares about data.

That just didn't seem like doing due diligence and keeping an open mind to me, or very fair to the author.
The very fact that I have an open mind is why I'm willing to look at the data and see what it tells me. However, if you're going to contradict what I've heard from multiple other sources, then you'd better present some data that's compelling in both quality and consistency. If not, it will get treated as an outlier.

I will say the arrow and 1 of 5 images small header over the image space were really obvious to me,
I didn't see them. I'm using Firefox on Windows, with the default color scheme. Perhaps my eyesight isn't what it used to be, but even now it doesn't jump out at me.

Do you read Chinese? Maybe the fact that I'm not used to seeing Chinese screenshots just caused my brain to treat it as "visual clutter", which kept me from picking up on it. The "1 of 5" seems obvious enough, now. I can't explain why I didn't see it, but I usually look for the arrow - not that.

But I can also understand why the article was written given that it's a leak with 'evidence' that contradicts the prevailing narrative.
Yes, I'd like to know what data is out there, even if there are some inconsistencies. It's only by looking at the totality of data points that we can start to decide which are truly the outliers.

It wouldn't be entirely out of line for them to get ~10% additional performance out of doing that kind of thing, especially given that we've seen Intel process improvements themselves give performance improvements to the same cache designs.
I assume it's already well-optimized, since the original Raptor Lake was effectively made on Intel's 10 nm ++++ node. Yes, 4 +'s = 5th generation:
  1. Canon Lake (original 10 nm)
  2. Ice Lake (improved 10 nm)
  3. Tiger Lake (SuperFin)
  4. Alder Lake (Enhanced SuperFin)
  5. Raptor Lake (further, unspecified improvements)

Was there really that much room for remaining optimizations? I'm no process engineer, but it seems like most Intel nodes converge towards a performance level and don't experience big jumps in a late iteration.

Anyway, I'll await more data. I believe more official information should be made available at an Intel event, in about 2 weeks.

P.S. thanks for signing up. I hope we can have more productive exchanges about CPUs and more.
: )
 
  • Like
Reactions: Lucky_SLS
As for 3D cache, I've read enough about the structures and design elements in the Zen 3 and Zen 4 cores to enable it, that it looks very much like something you need to design for, from day 1. Stacked cache isn't something you can just bolt on, as an afterthought.

By 'designed for' you mean apart from 3d stacking, the physical connections to cpu logic dies itself? thats about the only major design challenge i see. they have the manufacturing capability foveros and connecting them with EMIB and implementaion with their prefetchers and schedulers...

Not to downplay it, but i think intel can definitely do this and this is not a complete architectural change.

Who knows, maybe we will see this in LGA 1851...
 
Sep 5, 2023
10
3
15
The chart didn't exactly align with "preconceptions", but understand that these performance expectations were based on previous leaks of performance data and design changes that seemed more authoritative than some random CPU-Z scores collected under unknown circumstances. Therefore, as we struggle to achieve a consistent idea of what to expect, the outlier claims of this article are immediately suspect. That the data wasn't clearly presented to us meant it was going to be called out - at least, by anyone who actually cares about data.


The very fact that I have an open mind is why I'm willing to look at the data and see what it tells me. However, if you're going to contradict what I've heard from multiple other sources, then you'd better present some data that's compelling in both quality and consistency. If not, it will get treated as an outlier.


I didn't see them. I'm using Firefox on Windows, with the default color scheme. Perhaps my eyesight isn't what it used to be, but even now it doesn't jump out at me.

Do you read Chinese? Maybe the fact that I'm not used to seeing Chinese screenshots just caused my brain to treat it as "visual clutter", which kept me from picking up on it. The "1 of 5" seems obvious enough, now. I can't explain why I didn't see it, but I usually look for the arrow - not that.


Yes, I'd like to know what data is out there, even if there are some inconsistencies. It's only by looking at the totality of data points that we can start to decide which are truly the outliers.


I assume it's already well-optimized, since the original Raptor Lake was effectively made on Intel's 10 nm ++++ node. Yes, 4 +'s = 5th generation:
  1. Canon Lake (original 10 nm)
  2. Ice Lake (improved 10 nm)
  3. Tiger Lake (SuperFin)
  4. Alder Lake (Enhanced SuperFin)
  5. Raptor Lake (further, unspecified improvements)

Was there really that much room for remaining optimizations? I'm no process engineer, but it seems like most Intel nodes converge towards a performance level and don't experience big jumps in a late iteration.

Anyway, I'll await more data. I believe more official information should be made available at an Intel event, in about 2 weeks.

P.S. thanks for signing up. I hope we can have more productive exchanges about CPUs and more.
: )
The answer to the last bit is unequivocally yes. There is more that can be gotten out of the DUV machines intel is using for the entire Intel7 line. Mostly because it's my understanding that the new mask making tools for EUV processes can be back ported for better DUV processes for better process control. The new masks equipment makes it possible to do Intel's multipaterning (I think they are doing quad patterning for the latest round of Intel7, but they may have made it work with just double or triple patterning.) with less 'fuzz' increasing which can be used to further increase yields or further improve performance. There are also some new tricks that have been shown off for multipatterning in general that work for both DUV, EUV, and Hi-NA EUV that can increase performance and yields. And given that intel plans to keep it's Intel7 fabs going for the foreseeable future for various tiles and other things they need to produce there will almost certainly be further improvements to Intel7. It's just a matter of how many of these things does intel have ready now.

My guess is that there's some fairly good new stuff involved in the Raptor Lake R production; mostly because intel was able to turn on the new power management stuff that was on Raptor Lake but wasn't enabled because while it worked amazing when it worked, but they had issues making it consistently work and if it didn't work it could fry the whole chip.

Those advancements are things that Samsung and TSMC have been using to their advantage for a good while now because they have been using masks designed for EUV patterning for several nodes, while Intel (in one of the dumbest moves in a long list of dumb moves before Gelsinger took over) has been stuck on DUV patterning and masks long after it's rivals transitioned to EUV machines for primary pattern work.
 

bit_user

Titan
Ambassador
By 'designed for' you mean apart from 3d stacking, the physical connections to cpu logic dies itself? thats about the only major design challenge i see.
Again, it's not only about the physical layout, but that's kinda abig deal. The other thing is the tag RAM's location and size, as well as cache tag lookups being designed to hit key targets, for the frequency it's meant to run at.

Not to downplay it, but i think intel can definitely do this and this is not a complete architectural change.

Who knows, maybe we will see this in LGA 1851...
I didn't mean to suggest they need a different microarchitecture. Look at Zen 4c, which eliminates support for 3D Cache and halves the size of the base L3 cache. It has the same microarchitecture, while using a substantially different layout.

 
  • Like
Reactions: Lucky_SLS

bit_user

Titan
Ambassador
The new masks equipment makes it possible to do Intel's multipaterning (I think they are doing quad patterning for the latest round of Intel7, but they may have made it work with just double or triple patterning.) with less 'fuzz' increasing which can be used to further increase yields or further improve performance.
Well, that would align with the rumored 15% price increase, because adding more steps of multi-patterning increases wafer production time.

given that intel plans to keep it's Intel7 fabs going for the foreseeable future for various tiles and other things they need to produce there will almost certainly be further improvements to Intel7. It's just a matter of how many of these things does intel have ready now.
Once it's no longer their leading-edge node, there's probably less value it squeezing every last bit of performance or efficiency out of it, because the stuff where that tends to matter is made on the leading edge node. However, the chiplets made on older nodes are more cost-sensitive, and therefore probably less interested in adding multi-patterning steps.
 
Status
Not open for further replies.