News New software lets you run a private AI cluster at home with networked smartphones, tablets, and computers — Exo software runs LLama and other AI mo...

Admin · Jul 18, 2024

The new experimental software clusters your smartphones, tablets, and computers into the equivalent of a high-powered AI GPU.

New software lets you run a private AI cluster at home with networked smartphones, tablets, and computers — Exo software runs LLama and other AI mo... : Read more

mikeebb · Jul 18, 2024

At what point can I start claiming a household AI as a dependent on my taxes? 🤑

Much too early. Getting more coffee...

Peksha · Jul 18, 2024

Viral AI mining is now on your phone...

evdjj3j · Jul 18, 2024

"The developer showed off a demo of the software running Llama-3-70B at home using an iPhone 15 Pro Max, an iPad Pro M4, a Galaxy S24 Ultra, an M2 MacBook Pro, an M3 MacBook Pro, and two MSI Nvidia GTX 4090 graphics cards."

You could buy a really nice proper AI gpu for the price of all that hardware.

Peksha · Jul 18, 2024

evdjj3j said:
"The developer showed off a demo of the software running Llama-3-70B at home using an iPhone 15 Pro Max, an iPad Pro M4, a Galaxy S24 Ultra, an M2 MacBook Pro, an M3 MacBook Pro, and two MSI Nvidia GTX 4090 graphics cards."

You could buy a really nice proper AI gpu for the price of all that hardware.

Amazing achievements, especially since the llama3-70b model runs perfectly on only 24 GB GPU + 64 GB CPU 😉

jp7189 · Jul 21, 2024

Peksha said:
Amazing achievements, especially since the llama3-70b model runs perfectly on only 24 GB GPU + 64 GB CPU 😉

I'd like to know what speeds they are getting. I get around 2 tokens per second with llama3-70b 8Q using a 4090 for prompt ingestion and a 9654 with 768GB for inference. If they got say 5 tok/sec I would be impressed, or if they did it using an un-quantized model, they would be impressive also.

Peksha · Jul 22, 2024

jp7189 said:
I'd like to know what speeds they are getting. I get around 2 tokens per second with llama3-70b 8Q using a 4090 for prompt ingestion and a 9654 with 768GB for inference. If they got say 5 tok/sec I would be impressed, or if they did it using an un-quantized model, they would be impressive also.

ollama run llama3:70b --verbose
>>> why is the sky blue
...
total duration: 2m2.554287s
load duration: 12.1295ms
prompt eval count: 15 token(s)
prompt eval duration: 2.179578s
prompt eval rate: 6.88 tokens/s
eval count: 401 token(s)
eval duration: 2m0.361818s
eval rate: 3.33 tokens/s

7900xtx+7950x3d@64GB

Search

News New software lets you run a private AI cluster at home with networked smartphones, tablets, and computers — Exo software runs LLama and other AI mo...

Admin

Administrator

mikeebb

Distinguished

Peksha

Prominent

evdjj3j

Distinguished

Peksha

Prominent

jp7189

Distinguished

Peksha

Prominent

TRENDING THREADS

Latest posts

Moderators online

Share this page