News Microsoft researchers build 1-bit AI LLM with 2B parameters — model small enough to run on some CPUs

The race to the bottom is finished... time to turn around and start heading back up. Though I think we'll need a paradigm shift to improve efficiency at higher precision. I still believe that eventually we'll get there.
 
The race to the bottom is finished... time to turn around and start heading back up. Though I think we'll need a paradigm shift to improve efficiency at higher precision. I still believe that eventually we'll get there.
If it works, who cares? But I say that more about the precision than the parameter count. Parameter counts being optimized downward has been an important development, but at some point you have to increase it, and you mostly need more memory which desktop CPUs/APUs should be able to have easily.

We can make LLMs run on 8 GB phones, but it would be better to have 24-32 GB of memory. I believe we can reach a point where budget smartphones can have 32-64 GB of RAM, if Samsung/Micron/SK Hynix successfully develop 3D DRAM (like 3D NAND) in the early-mid 2030s, and that drives cost-per-bit down by an order of magnitude.

As far as improving efficiency at higher precision goes, I think AMD's adoption of "Block FP16" in the XDNA2 NPU could qualify for that. They promoted it as having the accuracy of 16-bit with the speed of INT8. There are only so many mathematical tricks you could pull to double performance though, and maybe we're already stuck with a choice of BFP16, INT8, INT4, FP4, INT2, INT1.58, etc.
 
  • Like
Reactions: jp7189
If it works, who cares? But I say that more about the precision than the parameter count. Parameter counts being optimized downward has been an important development, but at some point you have to increase it, and you mostly need more memory which desktop CPUs/APUs should be able to have easily.

We can make LLMs run on 8 GB phones, but it would be better to have 24-32 GB of memory. I believe we can reach a point where budget smartphones can have 32-64 GB of RAM, if Samsung/Micron/SK Hynix successfully develop 3D DRAM (like 3D NAND) in the early-mid 2030s, and that drives cost-per-bit down by an order of magnitude.

As far as improving efficiency at higher precision goes, I think AMD's adoption of "Block FP16" in the XDNA2 NPU could qualify for that. They promoted it as having the accuracy of 16-bit with the speed of INT8. There are only so many mathematical tricks you could pull to double performance though, and maybe we're already stuck with a choice of BFP16, INT8, INT4, FP4, INT2, INT1.58, etc.
I don't think gpu or even npu will scale well enough to get beyond small incremental improvements, and as long as that's the paradigm we'll keeping working at the bottom.. lower precision, lower parameters, better distillation techniques.

I'm thinking something entirely new will be needed to make something significantly more useful than what we have today. I have no idea what that will look like. A new achitecture? New math? Cheap quantum endpoints? Who knows.
 
I'm thinking something entirely new will be needed to make something significantly more useful than what we have today. I have no idea what that will look like. A new achitecture? New math? Cheap quantum endpoints? Who knows.
We're on an "s-curve" desperately searching for the next s-curve. This research could be applicable, if low-precision math can be used for e.g. a spiking neuron model. On the hardware side maybe we'll see 3D neuromorphic chips developed to accelerate it. I'm predicting that an ever-closer "brain imitation" will be the next big thing, and planar chips are insufficient. The current NPUs could become a thing of the past or repurposed for non-AI matrix operations.

I don't think we've seen much adoption of "in-memory computing" yet.
 
  • Like
Reactions: jp7189