News OpenAI intros two lightweight open-model language models that can run on consumer GPUs — optimized to run on devices with just 16GB of memory.

Admin · Thursday at 1:26 PM

gpt-oss-120b and 20b are the first open-model LMs since GPT-3

OpenAI intros two lightweight open-model language models that can run on consumer GPUs — optimized to run on devices with just 16GB of memory. : Read more

JRStern · Thursday at 9:38 PM

Both models take advantage of a Transformer using the mixture-of-experts model, a model that was popularized with DeepSeek R1. Despite gpt-oss-120b and 20b's design focus towards consumer GPUs, both support up to 131,072 context lengths, the longest available for local inference. gpt-oss-120b activates 5.1 billion parameters per token, and gpt-oss-20b activates 3.6 billion parameters per token. Both models use alternating dense and locally banded sparse attention patterns and use grouped multi-query attention with a group size of 8.

SMH*10^23

Captain Awesome · 2025-08-08T15:11:13-0400

But how many tokens per second would a 5070 ti output? And how censored will this model be?

Search

News OpenAI intros two lightweight open-model language models that can run on consumer GPUs — optimized to run on devices with just 16GB of memory.

Admin

Administrator

JRStern

Distinguished

Captain Awesome

TRENDING THREADS

Latest posts

Moderators online

Share this page