News Fujitsu uses Fugaku supercomputer to train LLM: 13 billion parameters

The training of Fugaku-LLM naturally took advantage of distributed parallel learning techniques optimized for the supercomputer's architecture and the Tofu interconnect D.

To put it more clearly, the 'Megatron-DeepSpeed' DL framework was actually ported to Fugaku, and the dense matrix multiplication library was accelerated for 'Transformer', so as to maximize the distributed training perf.
 
  • Like
Reactions: brandonjclark
To put it more clearly, the 'Megatron-DeepSpeed' DL framework was actually ported to Fugaku, and the dense matrix multiplication library was accelerated for 'Transformer', so as to maximize the distributed training perf.
Let's make that more clear... 😉

Megatron (an NVidia-developed framework that excels at multi-GPU AI Acceleration), coupled with DeepSpeed, a Microsoft-written library were used.

The framework ensures many GPU's can be used at once to train the model.

The library itself helps by accelerating the performance of the model during training and operation by introducing many cool things like gradient-loss control and parallelism while still being fairly memory efficient.

When you add on a self-attention tooling like Transformers, you have a model that can pick out or highlight the more important sections of the input.

This type of AI acceleration (all of it put together) is very good at Natural Language Processing.


I think what I've said is true but I'm still learning.
 
If they really want to perfect AI, they should be starting small and figuring out how to get the program to 'understand' what it knows... millions of smaller AI projects will absolutely move things forward far faster than a few massive ones in the long run.