The scheduled forum maintenance has now been completed. If you spot any issues, please report them here in this thread. Thank you!
I'm sure he'll want the efficiency scores to be bracketed by performance tier. That's pretty fair, because it'd otherwise be too easy to game the 1T efficiency score by cutting clocks and/or cores.Uhhhh do you not see in the graph you provided that the Ryzen 5 7600x single thread efficiency handily breaks open a can of whoop on the core i5-13600k’s score which has the highest single thread efficiency of all Intel parts tested?
When you cross the threshold from 2 CCDs to 1, there's a big jump in efficiency. Can't miss it.Just look at the single thread efficiency ramp up for zen 4: 7950x @ 5.7 ghz = 41.7 pts, 7700x @ 5.4 ghz = 83 pts (IE +100% single thread efficiency gain by dropping frequency by 300 MHz), then 7600x @ 5.3 ghz = 124.3 points (IE a + 50% efficiency gain by dropping frequency by another 100 MHz).
I'm not in every thread where Terry hangs out and vice versa. Plus, several others have disputed his claims, even in this very thread!All these forums are basically bit_user vs Terry.
It seems more like Terry vs everyone else.All these forums are basically bit_user vs Terry.
I suspected he was over-hyped, until I read the interviews of him on Anandtech. Those won my respect.Jim Keller is awesome. For a long time, K10 was pretty much my favorite architecture.
They IPO'd?He's the reason I bought in on the Tenstorrent offering. So I'm pretty happy to finally see news releases on Tenstorrent. They need to keep using Keller as a tech celebrity to just provide expert views like this.
Haha...no, just the two I've been in a lot lately. It's hard for me to see Terry's viewpoint, but it might be because I don't like Intel. I really dislike the way they treat customers with artificially locking features into performance tiers and changing sockets all the time.I'm not in every thread where Terry hangs out and vice versa. Plus, several others have disputed his claims, even in this very thread!
I have nothing against Terry. If I see what I think is a misinterpretation or misrepresentation, I'm going to try and correct it. That's all.
Oops, I'm wrong. They didn't IPO. It was something else I bought when I was trying to buy them. I bought a company connected to a bunch of other funding that I hoped was related. So I don't think I've got a horse in that race.I suspected he was over-hyped, until I read the interviews of him on Anandtech. Those won my respect.
They IPO'd?
"integer performance" is relevant to literally nothing
Single thread efficiency for 2 CCD CPUs is about 4 watts higher than the same frequency on single CCD cpu but this does not disprove that 5.6-5.7 ghz is past the voltage frequency optimal range and is the reason ryzen 9 is so inefficient at single thread, it has nothing to do with 2 CCD’s besides the addition of 4 watts (IE ryzen 7700x @5.4ghz single thread pulls 23 watts and ryzen 9 7900 @ 5.4 ghz as well pulls 27 watts). Also, since the ryzen 9 7950x at any of its ECO settings performs better in multi-threaded applications than the 13900k at the same wattage, it further reinforces that the zen core architecture is more efficient than Intel Raptorlake architecture in spite of added interconnect power usage. It is the fact that AMD pushed the 7900x and 7950x beyond the optimal range of 5nm’s frequency/voltage curve that single threaded efficiency is terrible on ryzen 9 zen 4 cpu’s. It’s like overclocking when you have to add 0.1 volts just to get 100 MHz more out of your cpu, then 0.15 volts to get an additional 100mhz, IE you have gone past the optimal range into the range of diminishing returns. That’s what AMD did to approach around the same single thread boost clock as Intel using a process node built for mass compatibility not all out performance like Intel 7. In fact, I predict Intel 4, 3, etc. will lose upper end frequency/voltage efficiency due to the mass compatibility requirement for these nodes under IFS. We will see.I'm sure he'll want the efficiency scores to be bracketed by performance tier. That's pretty fair, because it'd otherwise be too easy to game the 1T efficiency score by cutting clocks and/or cores.
He's found the biggest weakness of Zen 4 on the desktop, and he knows it. AMD won't have a proper answer for energy-efficiency of single-threaded tasks that can also deliver competitive performance, until they bring their monolithic APUs to the desktop. Otherwise, AMD's chiplet architecture seems to have overheads that you need multi-threaded tasks to amortize away.
Is it an important weakness? I'd argue not as much as Intel's crazy power consumption under highly-threaded loads.
That said, I think 1T efficiency isn't terribly germane to the discussion, because the CPU and system-level overheads get in the way of properly evaluating microarchitecture (i.e. core) efficiency. And that's what the article is really about.
When you cross the threshold from 2 CCDs to 1, there's a big jump in efficiency. Can't miss it.
I'm not in every thread where Terry hangs out and vice versa. Plus, several others have disputed his claims, even in this very thread!
I have nothing against Terry. If I see what I think is a misinterpretation or misrepresentation, I'm going to try and correct it. That's all.
There's a lot we don't know about the on-die interconnect of GPUs.That potential for "Infinite Tiling" 2D Flat Torus or a set of nested Rings is "Brilliant" from a Tiling / Chiplet Architectural standpoint.
Just add more rings around your center if you want a bigger CPU =D
In reality, these things don't need to scale infinitely. There's a fairly narrow window between what you can comfortably do with a single chiplet and a CPU so large, expensive, and hot that no one wants it. It seems likely Intel would've benefited from making their chiplets a little smaller, but increasing chiplet count beyond 8 or so probably would've been pointless.Not what Intel and it's ultra-simplistic Quad-Tiles w/ Mesh inside turned out to be.
It doesn't even take advantage of the physical Geometric 2D Square Shape for infinite repeated tiling / tessallation.
Well, rumors of frequency scaling woes on Meteor Lake (Intel 4) seem to align with that.I predict Intel 4, 3, etc. will lose upper end frequency/voltage efficiency due to the mass compatibility requirement for these nodes under IFS.
There's a lot we don't know about the on-die interconnect of GPUs.
Is that also how you feel about the new EPYC 9000 series w/ Genoa and it's 12 CCD's & cIOD to unify them?In reality, these things don't need to scale infinitely. There's a fairly narrow window between what you can comfortably do with a single chiplet and a CPU so large, expensive, and hot that no one wants it. It seems likely Intel would've benefited from making their chiplets a little smaller, but increasing chiplet count beyond 8 or so probably would've been pointless.
I'm more partial to the current Hub & Spoke model that AMD has chosen to adopt with it's cIOD & Chiplet.I like the symmetry of Sapphire Rapids and the original Epyc, where each die has its own local memory controller and inter-processor bus slice. If you pair this with optimal software partitioning, it's also more efficient.
Yes, I thought that would come up. We don't know if AMD chose to stick with the current 8-core CCD because it was optimal for Epyc or desktops. The fact that the chiplets are shared has the potential to force some compromises on each side.Is that also how you feel about the new EPYC 9000 series w/ Genoa and it's 12 CCD's & cIOD to unify them?
Who knows, but I wouldn't count on it. I think the package will start to get crowded, as they cram in DRAM and other things.I bet you that they'll be scaling beyond 12x CCD's in the future as well.
It doesn't scale for things like GPUs, however. Even for CPUs, it introduces avoidable compromises, if you can do some software partitioning around a more heavily NUMA approach.I'm more partial to the current Hub & Spoke model that AMD has chosen to adopt with it's cIOD & Chiplet.
It's a classic model that works reliably well and is very well understood along with it's PRO(s) & CON(s).
Yes, especially in the general-purpose CPU world.I'm not counting on "Optimal Software Partitioning".
I think you are in much more threads than Terry(or at least more threads I am interested in), But when you make it a bit personal just due to differing opinions it sure does seem like you have something against Terry other than his contrarian opinions↓I'm not in every thread where Terry hangs out and vice versa. Plus, several others have disputed his claims, even in this very thread!
I have nothing against Terry. If I see what I think is a misinterpretation or misrepresentation, I'm going to try and correct it. That's all.
I took Terry's above assessment as nuanced stating p-core at efficiency band (best clock for power/performance curve) was better than Zen 4 at same criteria... I am not sure that is true but they are both very close competitively. AMD seems to be much more competitive on productivity if you keep the power thresholds low, if you ignore power limits Intel ekes out a win. Your die picture comparing both really shows AMD has a large performance per chip density win, and maybe with disaggregation AMD has found new ways to shift circuitry off core die to IO or part of the fabric and make the most of expensive litho. I find it a fascinating discussion and its the crux of who is going to be more competitive next gen. What is Meteor IPC on intel 4 and how will transistor density compare to zen4/5. Execution from Intel will determine who it competes with.So far, I can't figure out what your agenda is, other than to try and minimize or distract from anything that's embarrassing to Intel. I've got to say that every one of your posts that try to put some positive spin on Intel or paint AMD in a negative light makes me feel just a little worse towards Intel. If you were a brand ambassador for team Intel, I'd say you're scoring some own-goals.
And I addressed why I think that's not accurate. He started out by saying we should separate out "core" power from "CPU" power, and then picked a metric which went almost as far as possible in the opposite direction.I took Terry's above assessment as nuanced stating p-core at efficiency band (best clock for power/performance curve) was better than Zen 4 at same criteria...
Not in terms of efficiency, which was the point.I am not sure that is true but they are both very close competitively.
It was comparing core + L2, only. That's about as functionally identical as you can get. There are two variables at play: lithography and micro-architecture.Your die picture comparing both really shows AMD has a large performance per chip density win, and maybe with disaggregation AMD has found new ways to shift circuitry off core die to IO or part of the fabric and make the most of expensive litho.
While I'm aware this is an article about Zen 5, I'm making no predictions about either it or Meteor Lake.I find it a fascinating discussion and its the crux of who is going to be more competitive next gen. What is Meteor IPC on intel 4 and how will transistor density compare to zen4/5. Execution from Intel will determine who it competes with.
First, contrarian is one who disputes the dominant opinion, whatever it is. That's not what I've been seeing. Find me contrarian positions Terry has taken that don't involve Intel or AMD. Find me a single negative thing Terry ever said about Intel, even when the majority opinion tended to favor Alder Lake vs. Zen 3.Can one have a contrarian opinion and not be labeled as a brand ambassador?
Because this is a place where we share information and debate ideas. When someone is consistently misrepresenting or misinterpreting information, it can do harm by spreading misinformation (whether intentional or not). Misinformation should be countered and corrected, and it's not just me who's taking issue with many of Terry's claims. If you value peace and harmony at the expense of letting people actively spread misinformation, then I will admit that perhaps I don't understand how you came to such a view.How about agreeing to disagree or admitting perhaps we don't understand how they came to such a contrarian conclusion?
But when you make it a bit personal just due to differing opinions it sure does seem li
Honestly its too much drama... Misinformation is another can of worms, we have data and conflicting data and perhaps both are right, its not black or white, it truly depends on particulars of each instance. You can find corner cases for any platform to say it is the best. Its what marketing gets paid for.Because this is a place where we share information and debate ideas. When someone is consistently misrepresenting or misinterpreting information, it can do harm by spreading misinformation (whether intentional or not). Misinformation should be countered and corrected, and it's not just me who's taking issue with many of Terry's claims. If you value peace and harmony at the expense of letting people actively spread misinformation, then I will admit that perhaps I don't understand how you came to such a view.
Why? life is too short, people have ignorant opinions all the time, you think you can change them?I think Terry’s creative claims should be battled
Then why are you getting involved? You're only making this more personal, not less.Honestly its too much drama...
It's simply whenever someone posts inaccurate facts, incomplete facts that lead to a wrong interpretation, an incorrect interpretation or conclusion from the facts supplied, or controversial claims which cannot be substantiated.Misinformation is another can of worms,
Data is a starting point. The next step is interpretation. Neither is infallible, and thus both are grounds for discussion. More data usually helps, because experimental variations are real and can account for some conflicting data.we have data and conflicting data and perhaps both are right,
Facts and absolutes are real things. Reaching clear conclusions is hard enough, when it doesn't seem like certain people have an agenda.its not black or white,
Again, you're the one saying that. I just pointed out that Terry's action might be self-defeating.Debating ideas is fine, its when you make it personal or ascribe malicious intent where a line is crossed.
You know how it goes. One person makes a rash generalization, another talks it down, but still managed to slip in some misleading statements... pretty soon, we're in the weeds.The original article is claiming zen5 is going to be king of integer performance and we got statements in the weeds comparing die density and and performance efficiency on current released products...
Performance-wise, their (P) cores are pretty close. I posted a link to that effect, in post #20, in case you didn't notice.My personal opinion is Intel and AMD are neck and neck,
As far as I'm concerned, all that matters is the products on the market. If we have data on unreleased products, it's unreliable and therefore not something I'm going to expend a lot of time or energy speculating about.all that matters is execution [release cadence in volume].
One would hope they've learned their lesson. The few things I've heard about TSMC would suggest there should be plenty of capacity on offer.Even if AMD comes out with zen5 and it is the best, if they didn't pre-order enough TSMC capacity
It's not about changing the mind of someone with a firm point-of-view. That's effectively impossible.Why? life is too short, people have ignorant opinions all the time, you think you can change them?
Because when you've worked in the technology industry at the OS or firmware level, if you're not a hardware engineer already, you'd understand that what they're saying is stuff that doesn't make sense period because it's not how it actually works.How about agreeing to disagree or admitting perhaps we don't understand how they came to such a contrarian conclusion? Keeping it civil and kind
I'm pretty sure AMD chose to make such small CCD's for yield & cost reasons.Yes, I thought that would come up. We don't know if AMD chose to stick with the current 8-core CCD because it was optimal for Epyc or desktops. The fact that the chiplets are shared has the potential to force some compromises on each side.
Another point would be to consider the CCD interconnect link, which you'd have to scale up for a bigger CCD. The HPC versions of Epyc actually use a pair of links per CCD, which shows it's indeed a bottleneck. Also, the CCD's L3 cache slice now has more consumers, and it was only in Zen 3 that AMD fused that L3 slice for sharing by all 8 cores.
The other point I'd make is about yield, and this can be a variable thing. A low-yielding process can incentivize very small chiplets, whereas a mature, high-yielding process can make very large dies viable. Look at how big Nvidia's dies are*, on TSMC 4N, and then tell me AMD had to use such small CCDs.
Who knows, but I wouldn't count on it. I think the package will start to get crowded, as they cram in DRAM and other things.
Speaking of chiplets, Jim Keller has a pretty good take on why we should expect to see more, and it's nothing really to do with size or yield:
BTW, the best quote from that talk is nothing to do with chiplets:
"The '3 nm' transistor is something like 1000 x 1000 x 1000 atoms. That's a billion atoms. I know we know how to shrink that."
True, the Hub & Spoke Model doesn't work well for GPU's, but as the engineers in the RTG group stated, you couldn't do chiplets the same way as the Ryzen team.It doesn't scale for things like GPUs, however. Even for CPUs, it introduces avoidable compromises, if you can do some software partitioning around a more heavily NUMA approach.
The Process Node Reticle Limit that will halve the maximum die area possible is coming, so if you make a design that is fully dependent on a "Large Die Area", then good luck with that when your Maximum Process Node Reticle Limit of 858 mm² on TSMC 5nm gets cut in half in the future down to 429 mm².Yes, especially in the general-purpose CPU world.
* The H100 is reportedly 814 mm^2
Considering that AMD processors have often lead Intel on either or both ever since AMD got the K6 out, I wouldn't be so sure.I am sure Intel has a much better design than AMD in Integer or floating point cpu design. Thus, Keller's comments have little value.
That's really just an argument about yield. We have an example of Nvidia making viable 5 nm dies at 814 mm^2, meanwhile the Zen 4 CCD is only 66 mm^2. This tells me they opted for 8 cores for reasons other than yield.I'm pretty sure AMD chose to make such small CCD's for yield & cost reasons.
Their main goal was to make manufacturing their entire product line very affordable, ergo having good margins while having a high performing part.
That sounds like an acknowledgement of my point that GMI bandwidth appears to be an issue. I'll take it.As far as Transmission Speeds, it's a matter of what they think they'll need, but besides doubling up on Bandwidth by using 2x GMI3 connections to the cIOD,
What you can imagine and what the chiplets actually support are two different things. Unless and until they actually discuss this option, we shouldn't assume it's possible.I can see other interesting uses for them in their CCD setups on their chips.
Bypassing the IOD, for CCD-to-CCD communication, would add a lot of complexity for probably rather minimal value to most use cases. You talk of NUMA domains as if they're the same between Zen 1 and later, but they're not. You're rattling off solutions, but I'm yet to be convinced you understand the underlying problem you're trying to solve.Some of the future CCD configs show a potential of having two CCD's facing each other, that would allow for a very short hop & direct GMI3 to GMI3 connection like 1st Gen ThreadRipper while maintaining a direct connection to the cIOD for the other GMI3 connection, this would allow 2x Zen Cores to have a direct short-cut to each other instead of being forced to make a round trip to the cIOD just to communicate, which would improve inter-NUMA node latency significantly by not having to travel that far just to visit your neighbors CCX.
It's dead, Jim. Let it go.something I see that they might eventually go down the road with OMI to solve the massive Contact Pin issues on their CPU's eventually.
That's not fair. I'm simply describing where the industry seems to be headed, based on actual product developments, roadmap announcements, and overall level of activity around CXL. Please don't place my view of these technologies on the same level as your OMI fantasy.I know you have a hard on for "On-Package DRAM Memory" and CXL.mem.
In the future, in-package memory would take the place of "Main Memory". Look at Nvidia's Grace, to see where things are headed. They packed in 512 GB of LPDDR5X and it can just use CXL for the rest. No need for any conventional DIMMs.But I don't see a world where CXL.mem isn't just another add-on card to supplement existing DIMM channels on the MoBo, especially with the CXL.mem Latency penalty.
CXL Memory Latencies within the Latency Pyramid:
@InvalidError already shot down this idea for cost reasons, in the other thread.I think L4$ w/ lots of SRAM will be a easier solution for AMD to bolt on instead of on-package DRAM,
You're taking that out of context. It was meant to be a footnote, to show that Zen 4 CCDs would've been viable with a lot more than 8 cores, which blows up the yield argument for why they kept with that size. I'm only talking about TSMC N5, here. For the sake of that argument, future nodes are irrelevant. And I'm not getting side-tracked onto a whole tangent about Nvidia.The Process Node Reticle Limit that will halve the maximum die area possible is coming, so if you make a design that is fully dependent on a "Large Die Area", then good luck with that when your Maximum Process Node Reticle Limit of 858 mm² on TSMC 5nm gets cut in half in the future down to 429 mm².
That's why I mentioned "Cost & Yields". It costs more for a larger die while smaller dies are much cheaper to produce & bin.That's really just an argument about yield. We have an example of Nvidia making viable 5 nm dies at 814 mm^2, meanwhile the Zen 4 CCD is only 66 mm^2. This tells me they opted for 8 cores for reasons other than yield.
It's only an issue if they can't keep the CPU's filled when they need to actually do work or transmit data so they can do work.That sounds like an acknowledgement of my point that GMI bandwidth appears to be an issue. I'll take it.
Oh please, don't be so boring such that you are limited to only what you can see right in front of you.What you can imagine and what the chiplets actually support are two different things. Unless and until they actually discuss this option, we shouldn't assume it's possible.
Within the same NUMA Nodes currently, if multiple CCD's want to talk to each other within the same Node, they have to send signals straight to the cIOD, and run back to the CCD within their NUMA node. That's a very long distance to travel and adds latency.Bypassing the IOD, for CCD-to-CCD communication, would add a lot of complexity for probably rather minimal value to most use cases. You talk of NUMA domains as if they're the same between Zen 1 and later, but they're not. You're rattling off solutions, but I'm yet to be convinced you understand the underlying problem you're trying to solve.
No, NEVER! You'll have to kill me first!It's dead, Jim. Let it go.
So what, I'll be their advocate. Consensus can be changed in due time with advocacy.No roadmaps show anyone adopting it and no one is talking about it. Whether or not you understand why they rejected it, you should respect the fact that the industry has reached a consensus. So, please stop wasting your time & ours talking about it.
Ah, the great CXL road-map that will change the world.That's not fair. I'm simply describing where the industry seems to be headed, based on actual product developments, roadmap announcements, and overall level of activity around CXL. Please don't place my view of these technologies on the same level as your OMI fantasy.
Good for them, nVIDIA's custom proprietary solution has on-package DRAM.In the future, in-package memory would take the place of "Main Memory". Look at Nvidia's Grace, to see where things are headed. They packed in 512 GB of LPDDR5X and it can just use CXL for the rest. No need for any conventional DIMMs.
He has his reasons, I have my reasons to think my way.@InvalidError already shot down this idea for cost reasons, in the other thread.
And I'm telling you that "Cost per Die" & Yields are important.You're taking that out of context. It was meant to be a footnote, to show that Zen 4 CCDs would've been viable with a lot more than 8 cores, which blows up the yield argument for why they kept with that size. I'm only talking about TSMC N5, here. For the sake of that argument, future nodes are irrelevant. And I'm not getting side-tracked onto a whole tangent about Nvidia.