News AMD Patents Chiplet Design To Build Colossal GPUs

Aug 22, 2020
8
2
15
Wasn't Ada Lovelace/Hopper('RTX 4k') gonna use a multi-GPU chiplet design, too?
This will certainly become a thing used for HPC. Will it ever filter down to gamers? Maybe but unlikely.
Look at how well the chiplets have worked in consumer CPUs.Look at the big.LITTLE aspect of ARM, too.
 

Koen1982

Reputable
Sep 29, 2020
7
2
4,515
I wonder why Nvidia never implemented something similar, just make a chip that have a x1 of x2 lane connection on the PCIe connector and connect them together with an nvlink.
 

InvalidError

Titan
Moderator
I wish companies would quit patenting the obvious. Dividing large designs into multiple chips for costs and manufacturability reasons has been around for decades - the Voodoo 2 was a multi-chip GPU with each chip handling a specific chunk of the rendering pipeline. This is little more than AMD's flavor of the same principles (partition a design along logical/practical boundaries) applied to its own GPUs.
 
They called me crazy

But I said this was going to happen over 5 years ago. The biggest hindrance was the data synchronizing and that would be solved by infinity cache like interfaces. Things went a lot slower than I predicted however. I knew that data coherency was a tough nut to crack.

We'll see if my last prediction comes true:

One APU:
CPU 8/16 core/thread chiplette
IO Die chiplette
iGPU chiplette
4GB HBM memory reserved by drivers as a cache.
 

jkflipflop98

Distinguished
They are working on it. First they said it couldn't be done two years ago.

Just recently they [nvidia] announced there is a delay in implementation. So you know they changed their mind.

I'd like to see a link to that quote. Being as Nvidia owns all 3DFX's technology, they already own multiple designs like this from decades ago.
 

spongiemaster

Admirable
Dec 12, 2019
2,345
1,323
7,560
They are working on it. First they said it couldn't be done two years ago.

Just recently they [nvidia] announced there is a delay in implementation. So you know they changed their mind.
Interesting that they said 2 years ago it couldn't be done, when they released a research paper 3.5 years ago that said it could be done.

https://research.nvidia.com/sites/default/files/publications/ISCA_2017_MCMGPU.pdf

From the conclusion:

"We show that with these optimizations, a 256 SMs MCM-GPU achieves 45.5% speedup over the largest possible monolithic GPU with 128 SMs. Furthermore, it performs 26.8% better than an equally equipped discrete multi-GPU, and its performance is within 10% of that of a hypothetical monolithic GPU that cannot be built based on today’s technology roadmap."

Here's another link from September 2019 that says they have multiple design solutions they could implement if they became more cost effective:

Nvidia has “de-risked” multiple chiplet GPU designs – “now it’s a tool in the toolbox”

Note this is for compute tasks only, not gaming. That has always been the crux of the issue. Multi-GPU works great for many compute tasks, but the complete death of any form of SLI in gaming shows how much more sensitive gaming is to the latencies and coherency between multiple GPU's. Chiplets may get us closer, but there is no indication from anyone yet, that it is actually the solution.
 
Interesting that they said 2 years ago it couldn't be done, when they released a research paper 3.5 years ago that said it could be done.

https://research.nvidia.com/sites/default/files/publications/ISCA_2017_MCMGPU.pdf

From the conclusion:

"We show that with these optimizations, a 256 SMs MCM-GPU achieves 45.5% speedup over the largest possible monolithic GPU with 128 SMs. Furthermore, it performs 26.8% better than an equally equipped discrete multi-GPU, and its performance is within 10% of that of a hypothetical monolithic GPU that cannot be built based on today’s technology roadmap."

Here's another link from September 2019 that says they have multiple design solutions they could implement if they became more cost effective:

Nvidia has “de-risked” multiple chiplet GPU designs – “now it’s a tool in the toolbox”

Note this is for compute tasks only, not gaming. That has always been the crux of the issue. Multi-GPU works great for many compute tasks, but the complete death of any form of SLI in gaming shows how much more sensitive gaming is to the latencies and coherency between multiple GPU's. Chiplets may get us closer, but there is no indication from anyone yet, that it is actually the solution.

You may be right. It might have been 3.5 years ago. However I distinctly remember them originally saying "It wasn't feasible" before that. Maybe that was fodder to mislead AMD. But AMD was already working on a solution by then.

You have to remember the first hints of a possibility of a MCM design showed up over 5 years ago. Their hints on the roadmap were scalable design with heterogeneous architecture. I knew the tech was already there based on [edit]Fiji's[/edit] interposer (Fury). I knew it was possible to break up a chip based on the functional block design if they could be coherent with the pixel data across the chips. That was the biggest stumbling block.

BTW:
jkflipflop98
3dfx's patents are non relevant. That old tech is closer to SLI than what this solution is doing. It is NOT a rehash of crossfire/sli. This is transparent to the driver and acts as a unified monolithic chip.
 
Wasn't Ada Lovelace/Hopper('RTX 4k') gonna use a multi-GPU chiplet design, too?

Look at how well the chiplets have worked in consumer CPUs.Look at the big.LITTLE aspect of ARM, too.

None of that is dual GPU's. There are some larger hurdles to overcome when you start talking more than one GPU and you are using it for gaming. Sure you can do it like SLI by replicating memory to each other but costs skyrocket and performance doesn't scale that well at least not without some driver coding. If they want multiple GPU chiplets it will take a very large architecture change. In very simple terms like having a main chiplet then a CU chiplet that you can add more of but they all talk back to this one parent chiplet. For HPC though they don't need the same memory coherency so its much simpler.