News New software lets you run a private AI cluster at home with networked smartphones, tablets, and computers — Exo software runs LLama and other AI mo...

"The developer showed off a demo of the software running Llama-3-70B at home using an iPhone 15 Pro Max, an iPad Pro M4, a Galaxy S24 Ultra, an M2 MacBook Pro, an M3 MacBook Pro, and two MSI Nvidia GTX 4090 graphics cards."

You could buy a really nice proper AI gpu for the price of all that hardware.
 
  • Like
Reactions: jp7189
"The developer showed off a demo of the software running Llama-3-70B at home using an iPhone 15 Pro Max, an iPad Pro M4, a Galaxy S24 Ultra, an M2 MacBook Pro, an M3 MacBook Pro, and two MSI Nvidia GTX 4090 graphics cards."

You could buy a really nice proper AI gpu for the price of all that hardware.
Amazing achievements, especially since the llama3-70b model runs perfectly on only 24 GB GPU + 64 GB CPU 😉
 
  • Like
Reactions: jp7189
Amazing achievements, especially since the llama3-70b model runs perfectly on only 24 GB GPU + 64 GB CPU 😉
I'd like to know what speeds they are getting. I get around 2 tokens per second with llama3-70b 8Q using a 4090 for prompt ingestion and a 9654 with 768GB for inference. If they got say 5 tok/sec I would be impressed, or if they did it using an un-quantized model, they would be impressive also.
 
I'd like to know what speeds they are getting. I get around 2 tokens per second with llama3-70b 8Q using a 4090 for prompt ingestion and a 9654 with 768GB for inference. If they got say 5 tok/sec I would be impressed, or if they did it using an un-quantized model, they would be impressive also.
ollama run llama3:70b --verbose
>>> why is the sky blue
...
total duration: 2m2.554287s
load duration: 12.1295ms
prompt eval count: 15 token(s)
prompt eval duration: 2.179578s
prompt eval rate: 6.88 tokens/s
eval count: 401 token(s)
eval duration: 2m0.361818s
eval rate: 3.33 tokens/s

7900xtx+7950x3d@64GB