How Nvidia's NVLink Boosts GPU Performance

  • Thread starter Thread starter Guest
  • Start date Start date
Status
Not open for further replies.
This is really impressive and will be great for HPC application developers ... as long as it's not locked down for CUDA only. There have been enough devs (especially in academia) already moving away from CUDA to OpenCL due to vendor lock-in. I'd hate to see a great hardware innovation largely ignored because of this, especially with NVidia's great GPUs.
 
This doesn't make any sense. PCIE has been giving us plenty of bandwidth, it's always stayed ahead of actual needed performance. Its also a tried and true fact that 16 lanes of the thing are unnecessary for today's graphics cards. The performance difference between using 16 and 4 lanes is only marginal. The difference between x8 and x16 is pretty much zero. In fact some tested higher in x8 but within margin of error. This is why hooking up your graphics card over thunderbolt is a liable idea.
 
This doesn't make any sense. PCIE has been giving us plenty of bandwidth, it's always stayed ahead of actual needed performance. Its also a tried and true fact that 16 lanes of the thing are unnecessary for today's graphics cards. The performance difference between using 16 and 4 lanes is only marginal. The difference between x8 and x16 is pretty much zero. In fact some tested higher in x8 but within margin of error. This is why hooking up your graphics card over thunderbolt is a liable idea.

You are assuming the only use GPUs have are for gaming. That's not true. The article CLEARLY states some of the applications that do benefit from this technology. GPUs are very good at computations, much much faster then CPUs.
 
Haswell and Broadwell only have 1 16X PCI 3.0 Lane. That's good for two 980's in SLI, with each card running in 8x instead of 16x.

When you add the third card, your out of bandwidth. Or if you use any sort of NVME PCI SSD like M.2, (which uses 4x PCI E 3.0 Lanes.)

thats why the big rigs use Haswell E.. Those 6 and 8 core Intel CPU's have 40 PCIE 3.0 Lanes

Thats what separates the big boys from the little boys. The number of pci lanes.
 
This doesn't make any sense. PCIE has been giving us plenty of bandwidth, it's always stayed ahead of actual needed performance. Its also a tried and true fact that 16 lanes of the thing are unnecessary for today's graphics cards. The performance difference between using 16 and 4 lanes is only marginal. The difference between x8 and x16 is pretty much zero. In fact some tested higher in x8 but within margin of error. This is why hooking up your graphics card over thunderbolt is a liable idea.

Forget about GTX gaming cars. Workstation cards (Quadros) can benefit from this technology.
 
This is more or less something for professionals. I have to say thought that nVidia's use in Fluent is pretty bad. It only recently got implemented, but just for radiation models. Most of us use Fluent for fluid dynamics ontop of heat transfer so....yea. Would like to see actual GPGPU use in Fluent soon. (To my credit, I did my thesis on wing design using a K5000).

Also, what is NV Link physically? Is it a bridge wire, chip? I don't want a g-sync style premium on gaming boards if this comes to that level....
 


Isn't the premium paid for g-sync at the monitor level? But anyway you bring up a good point. I guarantee you there will be a premium for this. Seems like it will be at the motherboard level.
 
According to the Anandtech article about the announcement, it will be a mezzanine connector, so you will have to lie down the GPU on the motherboard, at least to use it for GPU-CPU communication. If they go that way, they might as well go for GPU sockets and remove the external card stuff... As has been said before: this will not come to the consumer market, at best there will emerge a new standard from this at some point in the future.
 
This doesn't make any sense. PCIE has been giving us plenty of bandwidth, it's always stayed ahead of actual needed performance. Its also a tried and true fact that 16 lanes of the thing are unnecessary for today's graphics cards. The performance difference between using 16 and 4 lanes is only marginal. The difference between x8 and x16 is pretty much zero. In fact some tested higher in x8 but within margin of error. This is why hooking up your graphics card over thunderbolt is a liable idea.

You are assuming the only use GPUs have are for gaming. That's not true. The article CLEARLY states some of the applications that do benefit from this technology. GPUs are very good at computations, much much faster then CPUs.
This doesn't make any sense. PCIE has been giving us plenty of bandwidth, it's always stayed ahead of actual needed performance. Its also a tried and true fact that 16 lanes of the thing are unnecessary for today's graphics cards. The performance difference between using 16 and 4 lanes is only marginal. The difference between x8 and x16 is pretty much zero. In fact some tested higher in x8 but within margin of error. This is why hooking up your graphics card over thunderbolt is a liable idea.

You are assuming the only use GPUs have are for gaming. That's not true. The article CLEARLY states some of the applications that do benefit from this technology. GPUs are very good at computations, much much faster then CPUs.
This doesn't make any sense. PCIE has been giving us plenty of bandwidth, it's always stayed ahead of actual needed performance. Its also a tried and true fact that 16 lanes of the thing are unnecessary for today's graphics cards. The performance difference between using 16 and 4 lanes is only marginal. The difference between x8 and x16 is pretty much zero. In fact some tested higher in x8 but within margin of error. This is why hooking up your graphics card over thunderbolt is a liable idea.

You are assuming the only use GPUs have are for gaming. That's not true. The article CLEARLY states some of the applications that do benefit from this technology. GPUs are very good at computations, much much faster then CPUs.
This doesn't make any sense. PCIE has been giving us plenty of bandwidth, it's always stayed ahead of actual needed performance. Its also a tried and true fact that 16 lanes of the thing are unnecessary for today's graphics cards. The performance difference between using 16 and 4 lanes is only marginal. The difference between x8 and x16 is pretty much zero. In fact some tested higher in x8 but within margin of error. This is why hooking up your graphics card over thunderbolt is a liable idea.

You are assuming the only use GPUs have are for gaming. That's not true. The article CLEARLY states some of the applications that do benefit from this technology. GPUs are very good at computations, much much faster then CPUs.

You are correct but so is GameBrigada...for majority of PC users, it's a useless tech...and proprietary, witch makes it even worse.

 
Blimey A_J_S_B, what happened to your quoting? 😀

Re the article, blows my mind how many people here read this stuff yet
seem able to view this sort of tech only from a gamer perspective. Kinda
bizarre, gaming isn't remotely the cutting edge of GPU tech, never has been.
It can feel like it to those who buy costly GTXs, but in reality the real
cutting edge is in HTPC, defense imaging, etc. SGI's Group Station
was doing GPU stuff 20 years ago which is still not present in the
consumer market (unless that is anyone here knows of a PC that
can load and display a 67GByte 2D image in less than 2 seconds).

Ian.

 
Blimey A_J_S_B, what happened to your quoting? 😀

Re the article, blows my mind how many people here read this stuff yet
seem able to view this sort of tech only from a gamer perspective. Kinda
bizarre, gaming isn't remotely the cutting edge of GPU tech, never has been.
It can feel like it to those who buy costly GTXs, but in reality the real
cutting edge is in HTPC, defense imaging, etc. SGI's Group Station
was doing GPU stuff 20 years ago which is still not present in the
consumer market (unless that is anyone here knows of a PC that
can load and display a 67GByte 2D image in less than 2 seconds).

Ian.
Sure, but the reason for that is that there is no practical use for that for a gamer. Gaming is quite simple computationally, and 3d image generating has long been "settled" in a way. More quality has long been a question of faster, not more advanced (I am still waiting on full blown raytracing graphics that will probably never really arrive). The most "advanced" use of computation has been the inclusion of physics calculations in consumer cards, and even they are used for extremely simple problems.

My hope is that we at some point can unify graphics memory with system memory or other graphics cards, and for that one would need more bandwith. Still, it is probably not going to be an option for a while on consumer boards...
 
this will not affect gaming. It has already been proven that a pcie v3 4x or 8x slot has no/minimal bottleneck on any gpu. I actually would like to see some real world benches of other apps that could possibly use this.....
 
It would seem a lot of people misunderstood what this article is talking about. It's not about replacing pci express. It's about creating a faster link between cards in multi gpu setups. This is about gpus going forward and making sure we don't catch up with pcie any time soon.

Sorry typed with phone
 
I hope it will come to the consumer platform (ie geforce) so i can put my gpu anywhere i want to instead of being restricted by my pci-e slots. But future will tell
 
I hope it will come to the consumer platform (ie geforce) so i can put my gpu anywhere i want to instead of being restricted by my pci-e slots. But future will tell

That's not at all what this article is about. I think what you're looking for is a riser cable and super glue, both of which have been around for years.
 
It would seem a lot of people misunderstood what this article is talking about. It's not about replacing pci express. It's about creating a faster link between cards in multi gpu setups. This is about gpus going forward and making sure we don't catch up with pcie any time soon.

Sorry typed with phone

Actually NVLinK it's about replacing PCIe for CPU-GPU communication as well as GPU-GPU communication. IBM POWER CPUs have NVLinK. Next US department of Energy supercomputers, codenamed Summit and Sierra, will be based on NVIDIA GPUs and IBM POWER CPUs, connected with the NVLink.
 
A little bit more info on NVLinK, it looks like it wont be available in x86 CPUs on the foreseeable future for non-technical reasons (read, Nvidia doesn't want to license to Intel or Intel doesn't want to license NVLinK.

"NVlink is NVIDIA’s proprietary interface for CPU to GPU and GPU to GPU point-to-point communications. The basic building block for NVLink is a high-speed, 8-lane, differential, dual simplex bidirectional link. Multiple lanes can be tied together for higher bandwidth or connect individually to run many GPUs in a single system. Special CPUs with proprietary silicon on-chip interfaces will be able to communicate via NVlink to entirely bypass the PCI bus. Currently, NVlink products are targeted at HPC and enterprise customers. ARM and IBM CPU interfaces will become available while various non-technical issues need to be addressed before an x86 NVlink-capable CPU can be built."
 
To those who haven't yet figured it out, This is the implementation of Nvidia's oft promised Unified Memory Architecture (read HSA) that is supposed to debut with Pascal/Volta. The increased bandwidth is not required to meet current graphics workloads, but is to allow CPU, GPU(s), and system memory to be shared coherently. The new mezzanine connector is being introduced to reduce latency. Read about DirectX12 and HSA and I think you will have an Idea of what is coming...
 
From article it looks like It's another attempt to replace QPI, HyperTransport and many more interconnects already available(Basically all NUMA interconnects) along with SLI & PCI. We wont see these things in consumer market soon(PCI is legacy standard for GPUs...). Faster Interconnect between GPU and CPU would be nice. GPU-GPU interconnect wont make much sense unless algorithms are optimized to reduce CPU cycles and relay more on GPUs t do all kind of processing after sending data.

This looks promising for High Performance Computing tasks compared to PCI 3.0 x32 which provides 31.5GB/s speed vs 80 GB/s in 4x Link. But it wont be successful if it's gonna be a proprietary standard when it enters consumer market.
 
I don't think there would be much of a point in x86 environments given bandwidth restrictions...

This is clearly for HPC environtments. In last November TOP500 list about 90% of the supercomputers where x86 based. The number of supercomputers that use acelerators keeps increasing, and most of them have nvidia cards as accelerator. For me it makes sense to support NVLinkK in x86. Unfortunately, both, Intel (wants to sell Xeon Phi) and AMD (has its own accelerators) have good non-technical reasons for not to add NVLinK support in x86. However, in doing so they they may be giving the HPC market back to IBM on a silver plate (POWER9 has NVLinK support). Time will tell. Right now, the futrure top US supercomputers, set to be installed in 2017, are being built based on POWER9 + NVIDIA Volta.
 
Status
Not open for further replies.