Could you perhaps benchmark some non tensor cores GPUs too? It would be interesting to find out how much tensor cores matter. You could run llama 7b on a GTX 1660Ti, RTX 2060 and GTX 1080, measuing generation speed. It could lead to interesting results if GPTQ is benefitting from Tensor cores.