News AMD's Zen 6-based desktop processors may feature up to 24 cores

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
People forgetting Zen 6 is getting a redesigned IF, it might be a different interconnect but will be much faster and greatly improve on the throughput which is IIRC currently 72GB/s. It would be pointless having a 24 core 10950X with a small bump to even 7200MT's RAM. Already, 9950X is bandwidth starved (Chips and Cheese analyzed this in detail IIRC) another reason for the poor uplift over 7950X.
It's one of those things where again AMD has given me more questions than answers in the long Chips and Cheese interview.

Supposedly Strix Halo already doesn't use IF at all to talk to the IOD, but basically just analog wires from beach front to beach front with far less overhead, energy, protocol, latency etc. and even between CCDs, while the wires still go via the IOD... So it would also mean that the current latency penalty between CCDs would largely disappear, obviously something that would only work with very few CCDs, not with an EPYC chip.

But that would mean that the Strix Halo CCDs either support two modes or are different from what AMD pumps out normally to support both EPYCs and >APU. Both variants seem crazy and conflicting with the shared parts approach, but then every CCD also carries the burden of V-Cache enabledment, even if the vast majority of them never get paired with extra SRAM.
 
  • Like
Reactions: bit_user
Exactly what do you mean by "Alder-Lake++" ?
Alder Lake Atoms and any of its successors. There seem to be some under-the-table newer chips, which might just be slighty tweaked ADs..
I wonder if graphics cards haven't done it for a while? How else do you suppose a Nvidia workstation/server card offers ECC with the same base memory capacity as the gaming cards? If I'm not mistaken, you can switch it on/off in software.
Some at least do. I was rather surprised I could enable/disable ECC with little more than a reboot on my RTX 4090, when that feature used to be an exclusive option only for Tesla cards. And Nvidia worked hard on FUD to ensure no architect or engineer would design their finite elements parts in FP64 on a card that didn't have ECC RAM: "what's that little extra money against you being sued for a bridge that collapsed?"

Well FP64 performance is still capped on "consumer" chips and far outside public attention, and I obviously wouldn't ever want to sacrifice RAM size for ECC on AI work (or gaming for that matter).

I haven't checked on my smaller RTX GPUs, but I think I would have noticed if they had that option, too.
And I believe I could not disable ECC on the V100s, but those are gone, now.
Yeah, but I also remember reading about Intel having it for several years, before products started to emerge where you could actually utilize it. Sounds to me like it had teething issues, which took a while to sort out.
Yet like you I see it as too attractive an option to not be more widely available, expecially in embedded scenarios.

Remember that the lowest level Pentium CPUs often enabled ECC without extra charge, when most of the midrange required a hefty premium?

That's because they were extremly popular in those little 24x7 storage servers, where the need for ECC against was evidently well accepted.
 
  • Like
Reactions: bit_user
Yes, it was really more like a Hexagon DSP or MobileEye VPU. But it was also utterly useless, AFAIK only NUCs were enabled to use it to wake up for Alexa... Today I'm sure it still inside every mobile and desktop part, but effectively just dark silicon.
Isn't GNA based on Movidius IP, just like the NPU? Did they overlap, or did the NPU simply replace it?

Since I'm either using USB headsets or the monitor's audio via DP/HDMI, all audio hardware on mainboards from IP blocks to analog wizardry has been dark silicon/eletronic matter on my machines for more than a decade, too.

That includes those fancy polished Japanese caps and supposedly tons of shielding magic in circuit board traces.
I have an old Sandybridge board where the Toslink port finally seems to have died. I guess I could've bought a HDMI audio extractor or use a USB digital audio interface, but the board was already slated for retirement.

I had an old Supermicro workstation board, from a dozen+ years ago, where I used the analog audio output for a couple years, and the cross-talk I got from it reminded me of the 1990's! It sure makes you appreciate that "analog wizardry" some of the better boards use. But yeah, I tend to go for digital audio outputs whenever possible.

I bought the first sound card, with a digital output, that ALSA supported, back in the day. It's such a simple and uncluttered card, but even its analog out sounded good to me. Before that, I bought some other weird card that was one of the few with a digital output to be supported by OSS, but the driver was out of sync with my kernel and I lacked the skills to get it compiling.

Some simple picture upscaling/smoothing, sure. But that's no longer selling.

AI upscaling is a task way too computationally expensive to take from the GPU and it also relies on information only the GPU has: no way that I can see.
If we're talking iGPUs (I was), then the GPU doesn't really "have" that data. By the time you're doing upscaling, most of the framebuffer has probably gotten flushed out to DRAM, already. So, whether you use the iGPU or NPU doesn't make much difference, from a data perspective.

NPUs are designed to run small dense kernels, e.g. audio and image denoising, which fit mostly into their local on-chip RAM: they load it once during initialization and then can keep running them while the rest of the system is in low power with stopped clocks.
You don't need 45 TOPS just for that. They have DMA engines, so that you can stream in weights of larger networks without blocking their compute elements.

If they have to keep firing up the memory bus for their work, a) the energy benefits would largly go down the drain
That's not true. They're more efficient than GPUs by virtue of being relatively simple VLIW DSPs. Everything is nice and coherent. They don't need massive register files to support SMT, because they hide memory latency by using DMAs.

I almost never recommend youtube videos, but here's one you might find worthwhile. It's a look inside the PS5 Pro's GPU and how they modified RDNA2 to deliver 300 TOPS. It gives some clues about where AMD might be headed with their upcoming UDNA.
 
There seem to be some under-the-table newer chips, which might just be slighty tweaked ADs..
You mean Twin Lake? Someone told me it's the same exact silicon as Alder Lake-N, just up-spec'd by a couple hundred MHz.

I was rather surprised I could enable/disable ECC with little more than a reboot on my RTX 4090,
I'd be curious to know how it impacts the usable capacity.

I believe I could not disable ECC on the V100s, but those are gone, now.
I have access to a Titan V that I could check. Do you recall the details about how you enable/disable it? It's in a Linux machine.
 
You mean Twin Lake? Someone told me it's the same exact silicon as Alder Lake-N, just up-spec'd by a couple hundred MHz.
I guess, so, it's been flying through my input channels without much brain uptake...
I'd be curious to know how it impacts the usable capacity.
All I remember is that it was "reasonable" or what you'd expect, perhaps 22GB instead of 24.
I have access to a Titan V that I could check. Do you recall the details about how you enable/disable it? It's in a Linux machine.
The V100 ran on CentOS 7/8 so it was nvidia-smi (it has built-in help).
For the RTX 4090 on Windows I believe it was simply a tick box in the settings page.
 
  • Like
Reactions: bit_user
Isn't GNA based on Movidius IP, just like the NPU? Did they overlap, or did the NPU simply replace it?
After looking it up, I'm pretty sure you're right: I was thinking Intel Compute stick and then their Israeli aquisition got in the way and produced some hallucinations... it happens to humans, too.
If we're talking iGPUs (I was), then the GPU doesn't really "have" that data. By the time you're doing upscaling, most of the framebuffer has probably gotten flushed out to DRAM, already. So, whether you use the iGPU or NPU doesn't make much difference, from a data perspective.
You can do upscaling much like TV monitors are doing it. And that's the sort of thing GNA and NPUs could do easily enough.

Or you can do upscaling like DLSS/FSR/XeSS and that's way beyond what they can do.
You don't need 45 TOPS just for that. They have DMA engines, so that you can stream in weights of larger networks without blocking their compute elements.
Yes, too much for some stuff, too little for other stuff and without a niche big enough to grow the software ecosystem enough with developers making money.

Even the Movidius has DMA engines for flowing data, I believe, a lot of these chips are data-flow designs, which isn't easy on developers, yet another paradigm. And outside the phone niche where the power ceiling is unforgiving, that narrows the benefits in the niche too far, I believe, while much beefier variants would either hit the RAM bandwidth walls or wind up being as expensive in terms of transistors and Wattage as APUs.
That's not true. They're more efficient than GPUs by virtue of being relatively simple VLIW DSPs. Everything is nice and coherent. They don't need massive register files to support SMT, because they hide memory latency by using DMAs.
VLIW has always worked wonderfully when power was the primary constraint e.g. in DVD decoding on consumer players.
But it's been much less successful when competing with more traditional ISAs, e.g. Itanium: by the time developers finally know how to use it, the power advantage of those specific architectures has largely eroded.

I almost never recommend youtube videos, but here's one you might find worthwhile. It's a look inside the PS5 Pro's GPU and how they modified RDNA2 to deliver 300 TOPS. It gives some clues about where AMD might be headed with their upcoming UDNA.
I'll have a look, but count me sceptical, if only because after the last week-end testing every LLM I could run on my hardware I have a very bad case of hallucinits: when all they produce is garbage, the speed doesn't matter.
 
  • Like
Reactions: bit_user
I'll have a look, but count me sceptical, if only because after the last week-end testing every LLM I could run on my hardware I have a very bad case of hallucinits: when all they produce is garbage, the speed doesn't matter.
Well, I'll clue you into the fact that the 300 TOPS is for int8. They tweaked the GPU design specifically for some DLSS-grade upscaling. What seems weird about it to me is that they seem to have a simple model they replicate in all of the CUs (not all at once, I think, but probably in chunks), rather than taking a more traditional dataflow approach.

Anyway, it's interesting to hear him talk about their thought process and design decisions. It's a bit more revealing than I would've expected them to be.