News Baidu's AI breakthrough can meld GPUs from different brands into one training cluster — company says new tech fuses thousands of GPUs together to...

Admin · May 18, 2024

Baidu developed a system that will let it use GPUs from different brands and use it as a unified computing cluster for training LLMs.

Baidu's AI breakthrough can meld GPUs from different brands into one training cluster — company says new tech fuses thousands of GPUs together to... : Read more

Deleted member 2731765 · May 18, 2024

As Chinese President Xi Jinping said after the Netherlands blocked the export of ASML's lithography equipment,

Is that what Chinese President said via that link ? This is funny ! 😆

How to speed up a slow Mac ?

https://www.macworld.com/article/668632/how-to-speed-up-a-mac.html

The White House's attempts to stop technology transfers from the US and its allies may hinder China's scientific advancements in the short term.

Sorry! Page not found.
The page you're looking for has either been moved or removed from the site.
Please try searching our site or start again on our homepage.

Maybe try proof-reading articles BEFORE publishing ?

bit_user · May 18, 2024

The article said:
The company can combat AI chip scarcity by combining GPUs from different vendors.
...
If Baidu's claims are true, this is a massive development.

How much is that going to help, really? I'm just not seeing a huge upside, here. If you pair a handful of Chinese GPUs with a Nvidia GPU that's 10x as fast, then the total benefit on training time won't add up to much. Also, for anyone building AI systems, they're dealing in aggregates and I'll bet they build systems with either all Nvidia or all AMD GPUs. It's probably much more the exception that they're down to just a couple boards of either kind, and if you were, you just build another all-AMD system (for instance).

Furthermore, any time you don't have a high-speed fabric and have to rely on PCIe for interconnectivity, you're going to be at a significant disadvantage. The software overhead of abstracting each GPU API is going to add a little more, but I see it as neither a huge win nor a major impediment.

The article said:
If Li's claim is true, Baidu has achieved a brilliant technical breakthrough. This technology will allow the company to mix and match different GPUs

On a purely technical level, I think it's less impressive than prior techniques for mixing & matching different GPUs in the same machine.

https://www.anandtech.com/show/2844

https://www.anandtech.com/show/4522/ecs-p67h2a-review-a-visit-back-to-lucids-hydra/8

While searching for that, I found a project enabling disparate multi-GPU configurations for CFD:

https://www.reddit.com/r/CFD/comments/107mzx5/the_fluidx3d_v20_multigpu_uptate_is_now_out_on/

Sorta shows it's not quite the genius breakthrough the article claims.

JTWrenn · May 18, 2024

This has always been doable, the question is how large is the efficiency dropped by doing it.

ThomasKinsley · May 18, 2024

bit_user said:
How much is that going to help, really? I'm just not seeing a huge upside, here. If you pair a handful of Chinese GPUs with a Nvidia GPU that's 10x as fast, then the total benefit on training time won't add up to much. Also, for anyone building AI systems, they're dealing in aggregates and I'll bet they build systems with either all Nvidia or all AMD GPUs. It's probably much more the exception that they're down to just a couple boards of either kind, and if you were, you just build another all-AMD system (for instance).

Furthermore, any time you don't have a high-speed fabric and have to rely on PCIe for interconnectivity, you're going to be at a significant disadvantage. The software overhead of abstracting each GPU API is going to add a little more, but I see it as neither a huge win nor a major impediment.

On a purely technical level, I think it's less impressive than prior techniques for mixing & matching different GPUs in the same machine.

https://www.anandtech.com/show/2844

https://www.anandtech.com/show/4522/ecs-p67h2a-review-a-visit-back-to-lucids-hydra/8

While searching for that, I found a project enabling disparate multi-GPU configurations for CFD:

https://www.reddit.com/r/CFD/comments/107mzx5/the_fluidx3d_v20_multigpu_uptate_is_now_out_on/

Sorta shows it's not quite the genius breakthrough the article claims.

I think you've nailed it. To me this sounds like SLI on steroids, but SLI suffered from overhead with just two cards that made it a losing proposition. I can't imagine how what the overhead would be with hundreds or thousands of GPUs in a cluster. While the current tech is not impressive on a technical level, I still think it would probably be useful for LLM training during a supply shortage from the sanctions.

Pierce2623 · May 18, 2024

I’ll happily bet a hundred dollars that it’s literally just the recompilers made to run CUDA code on AMD or Intel just combined into one package.

bit_user · May 19, 2024

ThomasKinsley said:
SLI suffered from overhead with just two cards that made it a losing proposition. I can't imagine how what the overhead would be with hundreds or thousands of GPUs in a cluster. While the current tech is not impressive on a technical level, I still think it would probably be useful for LLM training during a supply shortage from the sanctions.

The communication patterns are very different for training vs. rendering. One of the first things the US sanctions targeted was communication bandwidth, because they knew how much that would impede training large-scale models. There are good reasons for Nvidia's focus on NVLink. AMD and Intel each have their own version, as well.

Gillerer · May 19, 2024

ThomasKinsley said:
I think you've nailed it. To me this sounds like SLI on steroids, but SLI suffered from overhead with just two cards that made it a losing proposition. I can't imagine how what the overhead would be with hundreds or thousands of GPUs in a cluster. While the current tech is not impressive on a technical level, I still think it would probably be useful for LLM training during a supply shortage from the sanctions.

SLI overhead affected gaming because the need for precise synchronization and low latency. (And what finally killed it was the impending move to low-level APIs such as DX12 and Vulkan, where it would have fallen on game developers to implement multi-GPU on each game.)

Depending on the kind of compute workload you have, these might not be a factor at all. Multi-GPU still has applications for compute and rendering tasks (just not real-time rendering).

Search

News Baidu's AI breakthrough can meld GPUs from different brands into one training cluster — company says new tech fuses thousands of GPUs together to...

Admin

Administrator

Deleted member 2731765

Guest

bit_user

Titan

JTWrenn

Distinguished

ThomasKinsley

Notable

Pierce2623

Commendable

bit_user

Titan

Gillerer

Distinguished

TRENDING THREADS

Latest posts

Moderators online

Share this page