Llama 3.1 405B runs at nearly a thousand tokens a second on Cerebras Inference, and took a quarter of a second to get the first token.
Cerebras video shows AI writing code 75x faster than world's fastest AI GPU cloud — world's largest chip beats AWS's fastest in head-to-head compar... : Read more
Cerebras video shows AI writing code 75x faster than world's fastest AI GPU cloud — world's largest chip beats AWS's fastest in head-to-head compar... : Read more