News China makes AI breakthrough, reportedly trains generative AI model across multiple data centers and GPU architectures

Is it a breakthrough though?
AFAIK using multiple data centers is not a new thing, it's just that no one likes dealing with the latency delays of waiting for data to arrive over the internet.

Which is why the preferred method is to put all eggs in one basket.
 
  • Like
Reactions: gg83
Is it a breakthrough though?
AFAIK using multiple data centers is not a new thing, it's just that no one likes dealing with the latency delays of waiting for data to arrive over the internet.

Which is why the preferred method is to put all eggs in one basket.
If their technique allows the training to be broken up into latency-insensitive chunks, like Folding@home, then it should probably be considered a breakthrough. I don't know if that's possible, just throwing it out there.
 
Is it a breakthrough though?
AFAIK using multiple data centers is not a new thing, it's just that no one likes dealing with the latency delays of waiting for data to arrive over the internet.

Which is why the preferred method is to put all eggs in one basket.

well, I remember reading news from MS and FB, both mentioning they met a bottleneck in training AI within a single data center.
that bottleneck is electricity.
the requirement for electricity was so huge that a data center could consume gigawatts, the single data center workload alone can destabilize the power grid.
I would say that the US developers will eventually have no choice but to train AI in multiple data centers as well.
 
well, I remember reading news from MS and FB, both mentioning they met a bottleneck in training AI within a single data center.
that bottleneck is electricity.
the requirement for electricity was so huge that a data center could consume gigawatts, the single data center workload alone can destabilize the power grid.
I would say that the US developers will eventually have no choice but to train AI in multiple data centers as well.
Or build dedicated data center power plants like Oracle new planned nuclear powered data center, or Microsoft’s repurposing of 3 mile island. The real breakthrough will be cold fusion … interesting thought experiment what would be the impact of limitless power supply on our economic and technical endeavors as a society?
 
  • Like
Reactions: gg83
Or build dedicated data center power plants like Oracle new planned nuclear powered data center, or Microsoft’s repurposing of 3 mile island. The real breakthrough will be cold fusion … interesting thought experiment what would be the impact of limitless power supply on our economic and technical endeavors as a society?
Limitless? We would turn Earth into magma.

There's still costs associated with it. But if we end up with $0.01/kWh, it will allow us to do some truly wasteful stuff.
 
Not sure that overcoming a state-specific deficiency should be considered an industry-wide breakthrough unless it is a must-be adopted practice across said industry. Time will tell I suppose.
 
  • Like
Reactions: gg83
Love these types of articles based on X posts of a guy who overheard someone say it happened, but it was in a meeting where he can’t talk about it due to an NDA, which means he overheard it being talked about during unrelated banter in the NDA meeting, which means there is 0 confirmability on the authenticity of this breakthrough other than hearsay…
Unfortunately this feels like 50% of all articles nowadays, and not just for tech related stuff.
 
Is it a breakthrough though?
AFAIK using multiple data centers is not a new thing, it's just that no one likes dealing with the latency delays of waiting for data to arrive over the internet.

Which is why the preferred method is to put all eggs in one basket.
Surely this is trivial, all you need is an interchange format.
It's done in big chunks anyway, so just read in the chunks and decode for your GPU.
Should be very little need to interchange between servers/racks in real time, any more than there is now.
Why wasn't this done long ago?
Oh, maybe it adds 0.001% to the processing load.
And maybe a given vendor would rather keep it proprietary.
 
While I believe that the China's AI and semicon technology is still behind the West, it is catching up fast. I don't think we can be complacent, thinking that China is still behind the West.
 
Why do some sites continue to post these Chinese claims of breakthroughs verbatim when 90% of the time they turn out to be completely false or at the very least, exaggerated. In this case, it's 100% exaggerated. Western tech has been using distributed data centers for years now.
 
Why do some sites continue to post these Chinese claims of breakthroughs verbatim when 90% of the time they turn out to be completely false or at the very least, exaggerated. In this case, it's 100% exaggerated. Western tech has been using distributed data centers for years now.
yes distributed data centers but apparently no for training ai model... would be too expensive
 
not clear less "efficient"... IF you are willing to wait 10x as long, often possible to do things more efficiently with older hardware
 
not clear less "efficient"... IF you are willing to wait 10x as long, often possible to do things more efficiently with older hardware... if you have crazy large amount of older hardware, and good integration you could do model training only when nothing better to do and spare electricity, running at most power saving low voltages and frequencies
 
Or build dedicated data center power plants like Oracle new planned nuclear powered data center, or Microsoft’s repurposing of 3 mile island. The real breakthrough will be cold fusion … interesting thought experiment what would be the impact of limitless power supply on our economic and technical endeavors as a society?
Cold fusion is quite literally pseudoscience
 
  • Like
Reactions: NinoPino
Is it a breakthrough though?
AFAIK using multiple data centers is not a new thing, it's just that no one likes dealing with the latency delays of waiting for data to arrive over the internet.

Which is why the preferred method is to put all eggs in one basket.

The only way to tell if there was a breakthrough would be if they published comparison benchmarks. And the only way to verify (trust) their claims is if they published enough details run an experiment to reproduce their results (don't hold your breath).

These days PyTorch/TensorFlow make it trivial to train a network on a multi-GPU computer. Coordinating this kind of multi-GPU+single-computer training requires serializing, segmenting, and synchronizing in a way that extends naturally to doing multi-server training, you just swap in TCP/IP read()/write() in place of gpu_to_host_memcpy()/host_to_gpu_memcpy() [or some optimization thereof, like NVLINK]. The trouble is that training slows down by a painful amount due to the extra network latency (mostly introduced by network bandwidth constraints, but also network overhead). So anything breakthrough here would be in the realm of training speed increases, which the article fails to discuss.