Nvidia Lowers DGX Station Pricing By 25%

Status
Not open for further replies.

The 500 TFLOPS number in the article only applies to specific applications that can make use of the tensor cores, e.g. deep learning. For general purpose single precision compute, the V100 is around 50% faster than a 1080 Ti (~15 vs ~10 TFLOPS).
 

bit_user

Splendid
Herald

Somewhere, I ran across a paper on how to refactor some classical HPC problem to exploit the tensor cores. Of course, there were some efficiency losses from mitigating the lower precision, but the net effect was better performance than simply using its double-precision units.
 

What about it? Are you asking what the crypto currency hashrate of this workstation would be? If so, it would depend on what you're mining, if you google tesla v100 hashrate you can find some numbers.
 

ddferrari

Distinguished
Apr 29, 2010
378
0
18,860
US



Yeah, that didn't get old about 10 years ago.
 

kyotokid

Distinguished
Jan 26, 2010
244
0
18,680
US

...it only has Tesla Compute cards so you'd need at least one Quadro for the graphics output which looking at the interior, means swapping out one of the Teslas for a Quadro GV100 as the board only accommodates 4 cards. The rub with that is the forthcoming GV100 still uses the PCIe interface so it is incompatible with the NVLink expansion slots in the workstation.

Also for rendering you are still limited to the VRAM on one card (as memory for rendering does no pool even with NVLink) so if you plan on rendering super epic level works or highly detailed animations, there's a chance those 20,000+ cores could be useless as once the process exceeds 16 GB it either crashes or dumps to the CPU/physical memory (depending on the render engine).

...and yes, it is possible to exceed 16 GB as large format ultra high quality rendering eats memory for brekkie (this is one of the reasons film studios like Pixar still use CPU rendering, slower, but has access to all the memory on the MB which can be significantly more than even a Quadro P6000).

So if you are doing scientific modelling or engineering research, a great system for a decent price considering the level of performance it offers. However, rendering in Blender or any other software we can afford, nada, unless you only use it as a networked render box (and have a Quadro in your primary system which would throttle render performance as that is still PCIe) making it a pretty expensive rendering appliance.
 

bit_user

Splendid
Herald

No, it does seem to use one of the V100's for video output. The exposed connectors appear to be on one of the cards and the manual says:
High-resolution displays consume a large quantity of GPU memory. If you have connected three 4K displays to the DGX Station, they may consume most of the GPU memory on the NVIDIA Tesla V100 GPU card to which they are connected, especially if you are running graphics-intensive applications.
Source: http://docs.nvidia.com/dgx/dgx-station-user-guide/index.html#ixzz597QjKJgi

What's got me is that it's only single-CPU. That means only x48 lanes of PCIe -> x8 lanes per card. For that kind of money, I'd have hoped they'd spring for a dual-CPU setup.


Maybe they've upgraded to Vega? It can use system memory as an extension of onboard RAM (for more info search "Vega HBCC").
 

kyotokid

Distinguished
Jan 26, 2010
244
0
18,680
US


...well not sure if NVLInk follows the same configuration as PCIe. The board apparently has four native NVLInk expansion slots which are a completely different pipeline architecture with a base 50% boost in transfer speeds over PCIe. I do find it kind of curious that it only supports a single Xeon CPU as well. One would think going for that kind of power it would have at least two.

This is the first application of NVLink I have seen outside of clusters being made for large supercomputers like Summit. Each cluster in the Summit supercomputer has two IBM Power 9 CPUs (each with 48 lanes of what is called "Bluelink" connectivity) and 6 Tesla V100s all on NVLink with 8 ports. What this means is direct connectivity between each of the GPUs and CPUS as well as full interconnectivity between all six GPUs with no need for switching. That means this ah heck will be really really fast.

I have never hears of Tesla cards being used for graphics output, rather, they are computational accelerators that can crunch numbers quickly and all four can pool memory for such tasks over NV Link, so in that respect, you have at 64 GB of GPU memory to use, just not for graphics production.

True, Vega can make use of physical memory but that will be at a cost in speed due to having to share processing among more channels as well as having fewer stream processors/cores than a dedicated GPU card. The Tesla V100 and Quadro GP100 have have 5120 cores while the Vega/Ryzen 2240G has only 704. Also Vega only supports OpenCL GPU rendering.
 

bit_user

Splendid
Herald

I'm pretty sure these use PCIe to communicate with the host CPU (and, by extension, its memory). The NVLink communication is probably over-the-top, as you can see from the pictures.


Each V100 has 6x NVLink2 lanes. So, I don't know what you mean by "with 8 ports", but you could have each V100 directly connected to the other 5 + 1 CPU. Whether this is optimal depends on your needs. If most communication is GPU <-> GPU (as in deep learning), then yes. But if the GPUs are mostly talking to the CPUs, then having a link to only 1 CPU would probably create a bottleck between the two CPUs as GPUs try to fetch data from memory attached to the other CPU.

NVLink has routing capabilities. So, for larger configs, they just reduce the connectivity and traffic can hop through one or more intermediate nodes. I've not heard of a centralized crossbar.


I think it's clear these aren't Tesla cards. They never said these were - you're just assuming that. The PCIe version of Tesla V100 appears not to have NVLink2 and doesn't have a graphics port. Also, they're passively cooled, whereas these clearly aren't.

http://images.nvidia.com/content/tesla/pdf/Tesla-V100-PCIe-Product-Brief.pdf

I think the DGX Station probably uses some variant of the Titan V, but with fully-enabled GV100's and maybe augmented over-the-top connectivity.


I don't understand why you're comparing big, dedicated Nvidia GPUs to AMD APUs. I was talking about Vega 64, which has 4096 stream processors and up to 16 GB of HBM2. Again, look up "Vega HBCC".
 

kyotokid

Distinguished
Jan 26, 2010
244
0
18,680
US

..but that's a GPU card not an APU which would require hybrid GPU/CPU rendering which does not significantly reduce render times as expected (part of the reason Lux Development dropped it and just went with pure CPU and pure GPU based rendering). Iray is the same. It has a GPU/CPU mode but that isn't much faster than pure CPU rendering. Octane handles this differently in that it holds the Geometry in VRAM and shuts excess texture load to the CPU/physical memory which is a bit more efficient than dumping the entire scene load to the CPU.

From what I just read, HBCC's best advantage is improvement in frame rate and paging which is important for gaming but would have little impact on cg production rendering. Effectively it treats VRAM as a last level cache and "some" system memory as VRAM (how much isn't mentioned, but I would expect up to the card's maximum VRAM limit). There is also the matter of increased latency as data would need to be fetched from system memory through the PCI bus. Also VRAM and system memory are two different animals. VRAM on the card (whether GDDR or HBM) is significantly faster than the memory sticks on the the board as it is directly linked to the GPU processor instead of having to come across through the CPU and PCIe slot.(this is likely why hybrid GPU/CPU rendering never really produced speed advantage we were hoping for).

As to what's inside, here is a link to the Nvidia product description, click on the datasheet pdf for the full specs. Definitely not Titan Vs. The cards are linked to each other and the CPU via NVlink. and even 6 lanes would allow for full connectivity between all four the 4 GPUs as well as each GPU to the CPU. This is not a render box, this is designed for deep learning.research.

https://www.nvidia.com/en-us/data-center/dgx-station/
 

bit_user

Splendid
Herald

The point of HBCC is to enable the GPU to efficiently access a lot more memory than will fit on the card, so I'm sure can cache more than 16 GB. Whether it helps you is dependent on how the renderer access data, but it's a good bet that it would. The only way to know for sure is to actually look at performance data of a renderer that uses it.


If you actually looked at the links I sent you, then why do you think I need to see this? If I'm already in the Tesla V100 datasheet and DGX Station User Manual, I think it's safe to say I'm familiar with the info on the DGX landing page.

And I didn't say it had standard Titan V cards, I said:
I think the DGX Station probably uses some variant of the Titan V, but with fully-enabled GV100's and maybe augmented over-the-top connectivity.
According to this, the Titan V PCB seems to have some support for NVLink, suggesting they might have used it as the basis for whatever is in this box:
The NVLink fingers on the TITAN V card are rudiments of the functional NVLink interface found on the Tesla V100 PCIe, being developed by NVIDIA, as the TITAN V, Tesla V100, and a future Quadro GV100 share a common PCB. The NVLink fingers on the TITAN V are concealed by the base-plate of the cooler on one side, and the card's back-plate on the other; so the female connectors of NVLink bridge cables can't be plugged in.
Source: https://www.techpowerup.com/239519/nvidia-titan-v-lacks-sli-or-nvlink-support


What you're describing is magic. Please cite some reason for thinking they use NVLink to communicate with the Intel Xeon CPU, or else please stop making stuff up.


You're just arguing with yourself. I never said otherwise, but I think (and they suggest) there are other applications for it in scientific computing.
 
Status
Not open for further replies.

ASK THE COMMUNITY