If it works, who cares? But I say that more about the precision than the parameter count. Parameter counts being optimized downward has been an important development, but at some point you have to increase it, and you mostly need more memory which desktop CPUs/APUs should be able to have easily.
We can make LLMs run on 8 GB phones, but it would be better to have 24-32 GB of memory. I believe we can reach a point where budget smartphones can have 32-64 GB of RAM, if Samsung/Micron/SK Hynix successfully develop 3D DRAM (like 3D NAND) in the early-mid 2030s, and that drives cost-per-bit down by an order of magnitude.
As far as improving efficiency at higher precision goes, I think AMD's adoption of "Block FP16" in the XDNA2 NPU could qualify for that. They promoted it as having the accuracy of 16-bit with the speed of INT8. There are only so many mathematical tricks you could pull to double performance though, and maybe we're already stuck with a choice of BFP16, INT8, INT4, FP4, INT2, INT1.58, etc.