News AMD Puts Hopes on Packaging, Memory on Logic, Optical Comms for Decade Ahead

Admin · Feb 22, 2023

AMD's efforts have doubled performance every two years in some computing fields, and it plans to keep these trends on track with advanced architecture and packaging.

AMD Puts Hopes on Packaging, Memory on Logic, Optical Comms for Decade Ahead : Read more

usertests · Feb 22, 2023

A further 10x performance/watt (efficiency) should be easily achievable. They want upwards of 200-1,000x within the next 15 years or so to enable zettascale supercomputers, and it might be possible to go further than that with 3D packaging.

Computing has already come so far, but adding a few more zeroes to the end could make things comical.

InvalidError · Feb 22, 2023

The CPUs with on-package if not direct-stacked memory that I predicted about two years ago are one more step closer.

I wouldn't be surprised if DDR6 ends up being the last external memory standard we get before main memory moves on-package and external memory expansion when you need more than whatever on-package memory your CPU/GPU has goes PCIe/CXL.

An IGP chiplet/tile with 4-8GB of stacked HBM-like memory and almost direct access to the system memory controller should be interesting.

bit_user · Feb 22, 2023

Weird that they wouldn't plot Genoa on here:

Maybe it blows up their nice trend, with that massive 1.5x core-count increase and DDR5 memory?

I also wonder what SPECint they used... did they go back and re-test old Opteron servers with SPEC2017?

AMD will also target processing in memory.

This should set the stage for an interesting battle with Samsung and SK Hynix, both of which have been very active in this space. If it's truly core to AMD's strategy, I doubt they'll be content to source PIM solutions from partners.

Can someone explain the difference between "2.5D Si INT, EFB" and "3D Chiplets", in the first slide of the second set?

And, on the next slide in that set, how much should we read into the subtly different wording: "DRAM layers" vs. "Memory layers"?

Another big target for efficiency savings, and thus potential performance boosts, are chip I/O and communications. Specifically, using optical communications

How much latency would these optical transceivers tend to add?

AMD also took some time to boast about the AI performance gains which have been delivered by its processor portfolio over the last decade.

Nvidia still owns this market. If AMD really wants to play ball, they need to do a better job of hardware support across their entire stack, making it easy for even novice users to use any AMD GPU for AI acceleration.

And then they need to design hardware that's 2 generations ahead of where they think Nvidia will be, because that's where they will actually be, when AMD launches it product. For too long, AMD's AI performance has been a generation behind Nvidia's. That's because Nvidia understands it's truly strategic, whereas AMD treats it as a "nice-to-have". However, the hardware doesn't even matter if the software support & mindshare isn't there.

bit_user · Feb 22, 2023

This slide places too little emphasis on software, IMO.

In particular, programming models will need to shift. Caches are a great way for speeding up software without breaking backward-compatibility, but the lookups burn a lot of power. I think we will need to start reducing dependence on hardware-managed caches. Hardware prefetchers are also nice, but have their own overheads and limitations.

Finally, I keep expecting the industry to start looking beyond conventional approaches to out-of-order execution. We can't go back to strictly in-order, but I think there's a better compromise than the current conceit of a serial ISA that forces the hardware to do all the work needed to find concurrency.

InvalidError · Feb 22, 2023

bit_user said:
Can someone explain the difference between "2.5D Si INT, EFB" and "3D Chiplets", in the first slide of the second set?

2.5D is when you use interposers to tie different dies together a bit like current-day HBM, 3D is when stuff gets stacked directly on top or tucked under some other major function silicon instead of dedicated interconnect silicon.

So AMD's 3D-Vcache products are basically a mix of 2.5D to connect chiplets and 3D for the extra cache directly on CPUs.

Since AMD spun off the cache and memory controllers into chiplets for its higher-end GPUs, the logical evolution would be for the cache-memory controllers to become the base die for some sort of HBM-like memory.

Going "pure 3D" may be problematic since heat produced closer to the BGA/LGA substrate has to travel through everything stacked on top to reach the IHS or heatsink. Anything besides low-power stuff where this isn't an issue will likely remain hybrid 2.5-3D for thermal management reasons.

bit_user · Feb 22, 2023

InvalidError said:
2.5D is when you use interposers to tie different dies together a bit like current-day HBM,

Well, if you look at the slide, the starting point seems to be chiplets, since they show a picture of a de-lidded EPYC. So, I assume "2.5D Si INT, EFB" means more than that.

InvalidError said:
Going "pure 3D" may be problematic since heat produced closer to the BGA/LGA substrate has to travel through everything stacked on top to reach the IHS or heatsink. Anything besides low-power stuff where this isn't an issue will likely remain hybrid 2.5-3D for thermal management reasons.

I wonder if they could integrate graphene to wick away heat from the compute layer.

https://arxiv.org/ftp/arxiv/papers/1503/1503.01825.pdf

"Compared with metals or semiconductors, graphene has demonstrated extremely high intrinsic thermal conductivity, in the range from 2000 W/mK to 5000 W/mK at RT. This value is among the highest of known materials. Moreover, few-layer graphene (FLG) films with the thickness of a few nanometers also maintain rather high thermal conductivity unlike semiconductor or metals. Therefore, graphene and FLG are promising materials for micro or even nanometer scale heat spreader applications"

You might use some TSV approach to send the heat up to the top of the stack, or maybe the entire package is a vapor chamber and you'd just have to draw the heat out to the edges of the chip.

Or, why would the compute die need to be at the base of the stack? Why couldn't it sit on top? Especially if it were restricted to computing on just the memory within the stack, then you might not need so many I/Os between it and the rest of the package.

InvalidError · Feb 22, 2023

bit_user said:
Or, why would the compute die need to be at the base of the stack? Why couldn't it sit on top?

I'd imagine that sending 100-250A through bottom dies' with a helluva bunch of TSVs would make burying DRAM, SRAM, NAND and other dense structures under a CPU/GPU die kind of problematic. Thermal vials through silicon would be a no-go for the same reason. Much simpler to limit TDP to what the stack can pass.

Using a graphene heat-spreading layer between 3D-stack layers might help alleviate hot spots and improve heat propagation through the stack, though I bet we are 10+ years away from economically viable ways of doing that for consumer electronics. Since graphene is an extremely good electrical conductor, the heat-spreading graphene layers would need to have thousands of holes precision-cut out of it to avoid interfering with copper pillars between dies, can't imagine that getting cheap any time soon.

JamesJones44 · Feb 22, 2023

Reminds me of Apple's UMA and UG solutions for Apple Silicon

JamesJones44 · Feb 22, 2023

InvalidError said:
The CPUs with on-package if not direct-stacked memory that I predicted about two years ago are one more step closer.

I wouldn't be surprised if DDR6 ends up being the last external memory standard we get before main memory moves on-package and external memory expansion when you need more than whatever on-package memory your CPU/GPU has goes PCIe/CXL.

An IGP chiplet/tile with 4-8GB of stacked HBM-like memory and almost direct access to the system memory controller should be interesting.

Yep, I figured once Apple pushed putting memory on die in the name of efficiency and it was successful, at some point the general PC community would follow for the same reasons.

Kamen Rider Blade · Feb 22, 2023

JamesJones44 said:
Yep, I figured once Apple pushed putting memory on die in the name of efficiency and it was successful, at some point the general PC community would follow for the same reasons.

It's the eventual push for tiering of more Memory Layers.

If you thought what we have now is crazy, there will be more layers of memory added in the future.

L4$/L5$/L6$/L7$ will all have their place.

Kamen Rider Blade · Feb 22, 2023

bit_user said:

Mounting DRAM on top seems like a problem not easily solved with the extra thermal mass above the compute dies.

Mounting it next door seems more practical, Apple proved it to work.

Add in HBM and modular chiplet based Memory Controllers that you can mass produce cheaply, it becomes a more practical manufacturing problem.
AMD already seperated the Memory Controller into it's own die on the GPU side, it's only a matter of time before I see them doing it on the CPU side so that it becomes easy/cheap to mix and match memory types as needed to allow easy creation of a variety of product SKU's using common parts like legos.

I can easily envison AMD making seperate memory controllers for each RAM type.
1x for Regular DRAM DIMMs using OMI type interface
1x for HBM#
1x for Regular GDDR#
All using tiny dies and mabye adding in SRAM on top of the Memory Controller to help lower latency and improve bandwidth.

TJ Hooker · Feb 22, 2023

bit_user said:
Well, if you look at the slide, the starting point seems to be chiplets, since they show a picture of a de-lidded EPYC. So, I assume "2.5D Si INT, EFB" means more than that.

I believe they're using "2D" to refer to MCM packaging that has the chiplet interconnects going through the (organic) package substrate (what they used for their CPUs starting with Zen 2). Whereas they're using "2.5D" to refer to products like their HBM GPUs, where the chiplet interconnects go through a silicon interposer (which in turn sits on top of the substrate). You can see a picture illustrating what (I believe) the difference is in slide 11 here: https://nepp.nasa.gov/workshops/etw...ues/1500_Ramamurthy-Chiplet-Technology-v3.pdf

I don't know if the way AMD is using the terms exactly lines up to industry standard definitions of 2D/2.5D (the presentation I linked above seems to consider both interposer and substrate based interconnects to be examples of "2.xD packaging"). I also can't figure out what "EFB" stands for in AMD's slide.

LawlessQuill · Feb 22, 2023

InvalidError said:
The CPUs with on-package if not direct-stacked memory that I predicted about two years ago are one more step closer.

I wouldn't be surprised if DDR6 ends up being the last external memory standard we get before main memory moves on-package and external memory expansion when you need more than whatever on-package memory your CPU/GPU has goes PCIe/CXL.

An IGP chiplet/tile with 4-8GB of stacked HBM-like memory and almost direct access to the system memory controller should be interesting.

DDR7 already exists for vram, and is in development for ram

LawlessQuill · Feb 22, 2023

I would like to see a plan for phasing out discrete cpus, and having a singular integrated chip system

InvalidError · Feb 22, 2023

LawlessQuill said:
DDR7 already exists for vram, and is in development for ram

GDDR7 is not DDR7. DDR6 is 4-5 years away and DDR7 would be another 5-7 years further beyond that, which will be well beyond the point where I expect most CPUs and GPUs to have on-package and likely 3D-stacked DRAM, which is exactly when I expect memory expansion to move to PCIe/CXL.

Once external memory goes PCIe/CXL, it won't matter what the underlying memory is, all you need is an appropriate memory controller bridge for whatever memory you want to use.

JayNor · Feb 22, 2023

The Ponte Vecchio base tile has 288MB of L2 SRAM. The Gomes paper shows it. What's to keep them from doing the same on CPUs?

https://www.hpcuserforum.com/wp-content/uploads/2021/05/Gomes_Intel_Ponte-Vecchio_Mar2022-HPC-UF.pdf

bit_user · Feb 23, 2023

JamesJones44 said:
Yep, I figured once Apple pushed putting memory on die ...

I think you mean "putting memory in package"

bit_user · Feb 23, 2023

Kamen Rider Blade said:
Mounting DRAM on top seems like a problem not easily solved with the extra thermal mass above the compute dies.

Mounting it next door seems more practical, Apple proved it to work.

Samsung and SK Hynix both have compute-in-memory solutions. I think at least Samsung's puts the compute in the bottom die of the stack.

Kamen Rider Blade said:
AMD already seperated the Memory Controller into it's own die on the GPU side, it's only a matter of time before I see them doing it on the CPU side

Actually, their CPUs had it first. If you remember back in Ryzen 3000-series, the I/O die had the memory controller + I/O.

Kamen Rider Blade said:
so that it becomes easy/cheap to mix and match memory types as needed to allow easy creation of a variety of product SKU's using common parts like legos.

Yeah, like maybe they could've used a different I/O die to effectively back-port Zen 4 to AM4, so that people could use it on cheaper motherboards and with DDR4.

bit_user · Feb 23, 2023

LawlessQuill said:
DDR7 already exists for vram, and is in development for ram

I think GDDR is its own thing and and the numbering has no direct correspondence with regular DDR memory standards.

LawlessQuill said:
I would like to see a plan for phasing out discrete cpus, and having a singular integrated chip system

Did you mean discrete GPUs getting phased out? Won't happen. The high-end GPUs will remain distinct from CPUs, for the foreseeable future. The main reason is GDDR memory, which has far higher bandwidth than is available to a CPU. It also turns out to be useful to upgrade your GPU without having to toss out your old CPU, as well.

At the low and even mid-range, we could see iGPUs with in-package memory eroding the market segment of dGPUs, but Apple's M1 Ultra shows that even packing like 8 channels of LPDDR5 in-package isn't enough to compete with high-end dGPUs. Don't believe me? Check its Geekbench scores.

InvalidError · Feb 23, 2023

bit_user said:
Did you mean discrete GPUs getting phased out? Won't happen. The high-end GPUs will remain distinct from CPUs, for the foreseeable future. The main reason is GDDR memory, which has far higher bandwidth than is available to a CPU.

And why is it that CPUs have lower memory bandwidth in the first place? The need for customizable memory size using DIMMs. Once CPUs have on-package memory as mainstream, the memory can be whatever the manufacturer wants it to be. Had Apple wanted to, it could have gone 2-4xHBM3E.

rluker5 · Feb 23, 2023

I think Intel will try to keep this tech in the server market for as long as possible for profit reasons.
They could put it in consumer, but I think they will wait for AMD to catch up and do that first.

As far as optical I also have concerns for latency, longevity and price.

InvalidError · Feb 23, 2023

rluker5 said:
I think Intel will try to keep this tech in the server market for as long as possible for profit reasons.
They could put it in consumer, but I think they will wait for AMD to catch up and do that first.

There are reasons why new chip-making tricks get used on high-value, high-margins stuff first. Bonding a bunch of chips together with 3D-stacking TSVs isn't cheap and incurs a significant amount of chip design overhead. It'll be a while for the whole process to get refined, more cost-efficient, more reliable and more readily accessible.

JamesJones44 · Feb 23, 2023

bit_user said:
I think you mean "putting memory in package"

Good catch, yes, in package, not on die.

bit_user · Feb 23, 2023

InvalidError said:
And why is it that CPUs have lower memory bandwidth in the first place?

The main reason is that they simply don't need more, at the consumer tier. The number of memory channels starts to get a little silly with the bigger server CPUs, but the main issue for servers is capacity.

InvalidError said:
Once CPUs have on-package memory as mainstream, the memory can be whatever the manufacturer wants it to be. Had Apple wanted to, it could have gone 2-4xHBM3E.

Cost. There's a reason consumer GPUs use GDDR memory and not HBM. It's also the main reason Nvidia used LPDDR5X in grace, rather than the HBM as we'd have expected.

"Power efficiency and memory bandwidth are both critical components of data center CPUs. The NVIDIA Grace CPU Superchip uses up to 960 GB of server-class low-power DDR5X (LPDDR5X) memory with ECC. This design strikes the optimal balance of bandwidth, energy efficiency, capacity, and cost for large-scale AI and HPC workloads.

Compared to an eight-channel DDR5 design, the NVIDIA Grace CPU LPDDR5X memory subsystem provides up to 53% more bandwidth at one-eighth the power per gigabyte per second while being similar in cost. An HBM2e memory subsystem would have provided substantial memory bandwidth and good energy efficiency but at more than 3x the cost-per-gigabyte and only one-eighth the maximum capacity available with LPDDR5X.

The lower power consumption of LPDDR5X reduces the overall system power requirements and enables more resources to be put towards CPU cores. The compact form factor enables 2x the density of a typical DIMM-based design."

Source: https://developer.nvidia.com/blog/nvidia-grace-cpu-superchip-architecture-in-depth/

The bandwidth they get to their directly-connected LPDDR5X is only about 546 GB/s (@ 32-channel -> 512-bit ?). So, the bandwidth tradeoff vs. HBM is real, and yet for reasons of capacity and cost they went with LPDDR5X.

Getting back to the premise of replacing dGPUs, I find it a little hard to swallow that we're going to put a 500+ W, $1.5k monster GPU + a 320 W, $800 gaming CPU + probably like $500 of HBM in a single package that can only be cooled with chilled water and you have to completely toss out, if any part of it breaks or you want to upgrade your memory, CPU, or GPU. That's why I think iGPUs will be limited to laptops and low-to-mid -range desktops. Or, exotic server chips like AMD's MI300.

Like the dinosaurs that ruled the earth for millions of years, dGPUs are very good at what they do. It will similarly take an industry-smashing asteroid to make them go extinct.

News AMD Puts Hopes on Packaging, Memory on Logic, Optical Comms for Decade Ahead

Administrator

Distinguished

Titan

Titan

Titan

Titan

Titan

Titan

Reputable

Reputable

Distinguished

Distinguished

Titan

Prominent

Prominent

Titan

Honorable

Titan

Titan

Titan

Titan

Distinguished

Titan

Reputable

Titan

Share this page