News AMD's CPU-to-GPU Infinity Fabric Detailed

I'm not sure how many people see this coming but desktop APU's in 3-4 years will be like this as well. When AMD is on TSMC's 3nm you have enough density and power savings to have a 8+ core CPU and the power of what was once a dedicated GPU all in one APU. If you have that you might as well have unified HBM memory on the APU as well.
 

JayNor

Reputable
May 31, 2019
429
86
4,760
Does the AMD's Infinity Link heterogeneous solution support the asymmetric cache coherency feature of CXL that appears to be responsible for its rapid adoption?

That CXL feature conceptually should remove interaction with caches of connected processors/gpus/fpgas/nnps using biased coherency bypass.



Codeplay is porting Intel's dpc++ to run on NVDA's processors.

https://codeplay.com/portal/02-03-2...-to-dpcpp-brings-sycl-support-for-nvidia-gpus

NVDA and AMD have both joined the CXL Consortium, and Papermaster recently made positive comments about it.

https://www.anandtech.com/show/1526...-mark-papermaster-theres-more-room-at-the-top

Did AMD present strong advantages for using Infinity Fabric vs PCIE5/CXL for their heterogeneous GPU interconnect?
 

Gomez Addams

Prominent
Mar 4, 2020
53
27
560
From the article : "...but Nvidia hasn't made any announcements about such wins, despite its dominating position for GPU-accelerated compute in the HPC and data center space. "

Oh? Doesn't the Perlmutter system qualify? It has 112 V100s in it.
 

Gomez Addams

Prominent
Mar 4, 2020
53
27
560
I'm not sure how many people see this coming but desktop APU's in 3-4 years will be like this as well. When AMD is on TSMC's 3nm you have enough density and power savings to have a 8+ core CPU and the power of what was once a dedicated GPU all in one APU. If you have that you might as well have unified HBM memory on the APU as well.

Imagine a package like the 3970 or 3990 with half of those CPU chiplets being GPU chiplets. Then throw a few chiplets with GPU-CPU-unified HBM4 memory in there. That could be an amazing machine.
 
  • Like
Reactions: JamesSneed
I looked at HSA years ago, and looking at AMD's scalable architecture, new memory types and made a very good guess where it was going. AMD was a big proponent of heterogeneous architectures working together. I read the leaves said this about 3 years back.

About two years ago I said, "You're going to see an APU with a chiplette for CPU, a chiplette forGPU, an IO Die, and HBM package that is part of unified memory dedicated to graphics calls."

My only miss was I predicted Zen 2 (Ryzen 3000) would come out of the gate this way. I was correct with chiplettes, but missed the GPU chiplette on package. But I'm betting we'll see something like this soon.

While HSA is being less emphasized, one of the problems I had resolving (when I drew up block diagrams) was indeed cache coherency between chiplettes. I knew infinity fabric was the answer. But I didn't have the exact answer in terms of algorithms to keep from overloading it. That one took me a while to figure out.

It's a similar problem dealing with GPU to GPU chiplettes. Everyone thought I was nuts. They likended it to SLI/Crossfire. But it isn't SLI/Crossfire with a unified memory architecture. It's just a matter of resolving tiles and sharing the data differences between them. Even NVIDIA posted a paper about how it wasn't practical. But they are all looking a lot closer at it now.
 
Last edited:

TheEldest

Distinguished
Mar 22, 2006
7
0
18,510
From the article : "...but Nvidia hasn't made any announcements about such wins, despite its dominating position for GPU-accelerated compute in the HPC and data center space. "

Oh? Doesn't the Perlmutter system qualify? It has 112 V100s in it.

112 V100s is about 14 Petaflops or 0.014 Exaflops. The wins AMD has are for super computers in the 1-10 Exaflop range (70x - 700x more powerful).

So I'd say, No, the perlmutter system doesn't qualify.
 

Gomez Addams

Prominent
Mar 4, 2020
53
27
560
I believe there is one exaflop-class supercomputer in progress now - the El Capitan. The current fastest one, Summit, is powered by GV100s and gets 200 PF for double precision calculations and 3EFs for AI/TensorFlow calculations.
 

d0x360

Distinguished
Dec 15, 2016
115
47
18,620
From the article : "...but Nvidia hasn't made any announcements about such wins, despite its dominating position for GPU-accelerated compute in the HPC and data center space. "

Oh? Doesn't the Perlmutter system qualify? It has 112 V100s in it.

No it doesn't even compare. AMD has selling many thousands of pieces of hardware for the world's fastest super computer. Having 112 v100's is nothing by comparison.

AMD is also moving into data centers with both CPU's and GPU's and since nVidia doesn't make a CPU they can't compete in that market. Epyc is a better choice than anything Intel is offering. The price difference is huge and epyc has better performance. So much so that VMware is changing how they charge based on core count...oddly one of the tiers ends with the exact core count of Xeon...weird
 

JayNor

Reputable
May 31, 2019
429
86
4,760
I believe there is one exaflop-class supercomputer in progress now - the El Capitan. The current fastest one, Summit, is powered by GV100s and gets 200 PF for double precision calculations and 3EFs for AI/TensorFlow calculations.

AMD Frontier

Intel Aurora
 

spongiemaster

Admirable
Dec 12, 2019
2,276
1,280
7,560
I looked at HSA years ago, and looking at AMD's scalable architecture, new memory types and made a very good guess where it was going. AMD was a big proponent of heterogeneous architectures working together. I read the leaves said this about 3 years back.

About two years ago I said, "You're going to see an APU with a chiplette for CPU, a chiplette forGPU, an IO Die, and HBM package that is part of unified memory dedicated to graphics calls."

You didn't need to be a psychic 2 or 3 years ago to know where the industry was looking. All you needed to do was read the Intel press release for the Kaby Lake G which was announced in 2017 and released in January 2018. The pairing of an Intel CPU and AMD GPU certainly added to the intrigue of Intel's EMIB.

kaby_lake_g_with_amd_radeon_package.png
 

GetSmart

Commendable
Jun 17, 2019
173
44
1,610
This new interconnect sound similarly like AMD's HyperTransport based HTX which never got wide adoption. Most likely will only be used in customized supercomputers (much like NVidia's NVLink).
 
You think it will take that long? I was expecting it to happen this year, with the Ryzen 4000-series (but it doesn't).

Yes 3-4 years. The GPU's in AMD's APUs are lagging about a year year behind so Ryzen 4000 mobile parts are using a tweaked Vega GPU. They need to get on RDNA2 or 3 and be on at least TSMC's 5nm but realistically TSMC's 3nm. However once we get there we likely will see most dedicated GPU's disappear in the consumer / gaming space.
 

GetSmart

Commendable
Jun 17, 2019
173
44
1,610
AMD's APU performance still highly dependant on available memory bandwidth shared with the CPU cores, hence why discrete GPUs (with its own dedicated high bandwidth memory) still perform better in most cases.
 

Olle P

Distinguished
Apr 7, 2010
720
61
19,090
... The GPU's in AMD's APUs are lagging about a year year behind... They need to get on RDNA2 or 3 and be on at least TSMC's 5nm ...
Why would that be required just to use the Infinity Fabric to run graphics? (I'm not discussing graphics performance, just the CPU/graphics connectivity in APUs.)
 
Why would that be required just to use the Infinity Fabric to run graphics? (I'm not discussing graphics performance, just the CPU/graphics connectivity in APUs.)

I wasn't saying that, they already are using infinity fabric on the current generation of APUs just not multiple GPU's like this is. What I was hinting at is the high power APU's not some low power APU in a laptop or tablet. I expect in 3-4 years AMD to have very high power APU's. What I mean is better than 2080 ti performance in an APU which today sounds silly but on TSMC's 3nm the densities should make something that easily. Couple that with the newer 3d stacking AMD is working on and you can have an APU with 8+ high performance cores, a 2080 TI class GPU likely RDNA3, and 32+GB of HBM memory(likely 3d stacked on the IO die) all on one chip. Things are going to get fun in about 4 years.