Yes, it was really more like a Hexagon DSP or MobileEye VPU. But it was also utterly useless, AFAIK only NUCs were enabled to use it to wake up for Alexa... Today I'm sure it still inside every mobile and desktop part, but effectively just dark silicon.
Isn't GNA based on Movidius IP, just like the NPU? Did they overlap, or did the NPU simply replace it?
Since I'm either using USB headsets or the monitor's audio via DP/HDMI, all audio hardware on mainboards from IP blocks to analog wizardry has been dark silicon/eletronic matter on my machines for more than a decade, too.
That includes those fancy polished Japanese caps and supposedly tons of shielding magic in circuit board traces.
I have an old Sandybridge board where the Toslink port finally seems to have died. I guess I could've bought a HDMI audio extractor or use a USB digital audio interface, but the board was already slated for retirement.
I had an old Supermicro workstation board, from a dozen+ years ago, where I used the analog audio output for a couple years, and the cross-talk I got from it reminded me of the 1990's! It sure makes you appreciate that "analog wizardry" some of the better boards use. But yeah, I tend to go for digital audio outputs whenever possible.
I bought the first sound card, with a digital output, that ALSA supported, back in the day. It's such a simple and uncluttered card, but even its analog out sounded good to me. Before that, I bought some other weird card that was one of the few with a digital output to be supported by OSS, but the driver was out of sync with my kernel and I lacked the skills to get it compiling.
Some simple picture upscaling/smoothing, sure. But that's no longer selling.
AI upscaling is a task way too computationally expensive to take from the GPU and it also relies on information only the GPU has: no way that I can see.
If we're talking iGPUs (I was), then the GPU doesn't really "have" that data. By the time you're doing upscaling, most of the framebuffer has probably gotten flushed out to DRAM, already. So, whether you use the iGPU or NPU doesn't make much difference, from a data perspective.
NPUs are designed to run small dense kernels, e.g. audio and image denoising, which fit mostly into their local on-chip RAM: they load it once during initialization and then can keep running them while the rest of the system is in low power with stopped clocks.
You don't need 45 TOPS just for that. They have DMA engines, so that you can stream in weights of larger networks without blocking their compute elements.
If they have to keep firing up the memory bus for their work, a) the energy benefits would largly go down the drain
That's not true. They're more efficient than GPUs by virtue of being relatively simple VLIW DSPs. Everything is nice and coherent. They don't need massive register files to support SMT, because they hide memory latency by using DMAs.
I almost never recommend youtube videos, but here's one you might find worthwhile. It's a look inside the PS5 Pro's GPU and how they modified RDNA2 to deliver 300 TOPS. It gives some clues about where AMD might be headed with their upcoming UDNA.