News How to Run a ChatGPT Alternative on Your Local PC

@jarred, thanks for the ongoing forays into generative-AI uses and the HW requirements. Some questions:

. What's the qualitative difference between 4-bit and 8-bit answers?

. How does the tokens/sec perf number translate to speed of response (output). I asked ChatGPT about this and it only gives me speed of processing input (eg input length / tokens/sec).

I'm building a box specifically to play with these AI-in-a-box as you're doing, so it's helpful to have trailblazers in front. I'll likely go with a baseline GPU, ie 3060 w/ 12GB VRAM, as I'm not after performance, just learning.

Looking forward to seeing an open-source ChatGPT alternative. IIRC, StabilityAI CEO has intimated that such is in the works.
 
  • Like
Reactions: bit_user
@jarred, thanks for the ongoing forays into generative-AI uses and the HW requirements. Some questions:

. What's the qualitative difference between 4-bit and 8-bit answers?

. How does the tokens/sec perf number translate to speed of response (output). I asked ChatGPT about this and it only gives me speed of processing input (eg input length / tokens/sec).

I'm building a box specifically to play with these AI-in-a-box as you're doing, so it's helpful to have trailblazers in front. I'll likely go with a baseline GPU, ie 3060 w/ 12GB VRAM, as I'm not after performance, just learning.

Looking forward to seeing an open-source ChatGPT alternative. IIRC, StabilityAI CEO has intimated that such is in the works.
The 8-bit and 4-bit are supposed to be virtually the same quality, according to what I've read. I have tried both and didn't see a massive change. Basically, the weights either trend toward a larger number or zero, so 4-bit is enough — or something like that. A "token" is just a word, more or less (things like parts of a URL I think also qualify as a "token" which is why it's not strictly a one to one equivalence).

For the GPUs, a 3060 is a good baseline, since it has 12GB and can thus run up to a 13b model. I suspect long-term, a lot of stuff will want at least 24GB to get better results.
 
  • Like
Reactions: baboma
I dream of a future when i could host an AI in a computer at home, and connect it to the smart home systems. I would call her "EVA" as an tribute to the AI from Command And Conquer

"hey EVA did anything out of ordinary happened since I left home ?"
or when im out of home : " hey EVA did my son already returned from school ? if so open vocal chat with him"
and of course the basic stuff like lights and temperature control

it's just the coolest name for an AI
 
Last edited:
UPDATE: I've managed to test Turing GPUs now, and I retested everything else just to be sure the new build didn't screw with the numbers. Try as I might, at least under Windows I can't get performance to scale beyond about 25 tokens/s on the responses with llama-13b-4bit. What's really weird is that the Titan RTX and RTX 2080 Ti come very close to that number, but all of the Ampere GPUs are about 20% slower.

Is the code somehow better optimized for Turing? Maybe, or maybe it's something else. I created a new conda environment and went through all the steps again, running an RTX 3090 Ti, and that's what was used for the Ampere GPUs. Using the same environment as the Turing or Ada GPUs (yes, I have three separate environments now) didn't change the results more than margin of error (~3%).

So, obviously there's room for optimizations and improvements to extract more throughput. At least, that's my assumption based on the RTX 2080 Ti humming along at a respectable 24.6 tokens/s. Meanwhile, the RTX 3090 Ti couldn't get above 22 tokens/s. Go figure.

Again, these are all preliminary results, and the article text should make that very clear. Linux might run faster, or perhaps there's just some specific code optimizations that would boost performance on the faster GPUs. Given a 9900K was noticeably slower than the 12900K, it seems to be pretty CPU limited, with a high dependence on single-threaded performance.
 
  • Like
Reactions: Firestone
I suspect long-term, a lot of stuff will want at least 24GB to get better results.

Given Nvidia's current strangle-hold on the GPU market as well as AI accelerators, I have no illusion that 24GB cards will be affordable to the avg user any time soon. I'm wondering if offloading to system RAM is a possibility, not for this particular software, but future models.

it seems to be pretty CPU limited, with a high dependence on single-threaded performance.

So CPU would need to be a benchmark? Does CPU make a difference for Stable Diffusion? Would X3D's larger L3 cache matter?

Because of the Microsoft/Google competition, we'll have access to free high-quality general-purpose chatbots. (BTW, no more waitlist for Bing Chat.) I'm hoping to see more niche bots limited to specific knowledge fields (eg programming, health questions, etc) that can have lighter HW requirements, and thus be more viable running on consumer-grade PCs. That, and the control/customizable aspect having your own AI.

Looking around, I see there are several open-source projects in the offing. I'll be following their progress,

OpenChatKit
https://together.xyz/blog/openchatkit

LAION AI / OpenAssistant
https://docs.google.com/document/d/1V3Td6btwSMkZIV22-bVKsa3Ct4odHgHjnK-BrcNJBWY
 
I'm building a box specifically to play with these AI-in-a-box as you're doing, so it's helpful to have trailblazers in front. I'll likely go with a baseline GPU, ie 3060 w/ 12GB VRAM, as I'm not after performance, just learning.
If you're intending to work specifically with large models, you'll be extremely limited on a single-GPU consumer desktop. You might instead set aside the $ for renting time on Nvidia's A100 or H100 cloud instances. Or possibly Amazon's or Google's - not sure how well they scale to such large models. I haven't actually run the numbers on this - just something to consider.

If you're really serious about the DIY route, Tim Dettmers has been one of the leading authorities on the subject, for many years:


At the end of that article, you can see from the version history that it originated all the way back in 2014. However, the latest update was only 1.5 months ago and it now includes both the RTX 4000 series and H100.
 
Last edited:
What's really weird is that the Titan RTX and RTX 2080 Ti come very close to that number, but all of the Ampere GPUs are about 20% slower.

Is the code somehow better optimized for Turing? Maybe, or maybe it's something else.
I'd start reading up on tips to optimize PyTorch performance in Windows. It seems like others should've already spent a lot of time on this subject.

Also, when I've compiled deep learning frameworks in the past, you had to tell it which CUDA capabilities to use. Maybe specifying a common baseline will fail to utilize capabilities present only on the newer hardware. That said, I don't know to what extent this applies to PyTorch. I'm pretty sure there's some precompiled code, but then a hallmark of Torch is that it compiles your model for the specific hardware at runtime.
 
I'm wondering if offloading to system RAM is a possibility, not for this particular software, but future models.
Not really. Inferencing is massively bandwidth-intensive. If we make a simplistic assumption that the entire network needs to be applied for each token, and your model is too big to fit in GPU memory (e.g. trying to run a 24 GB model on a 12 GB GPU), then you might be left in a situation of trying to pull in the remaining 12 GB per iteration. Considering PCIe 4.0 x16 has a theoretical limit of 32 GB/s, you'd only be able to read in the other half of the model about 2.5 times per second. So, your thoughput would drop by at least an order of magnitude.

Those are indeed simplistic assumptions, but I think they're not too far off the mark. A better way to scale would be multi-GPU, where each card contains a part of the model. As data passes from the early layers of the model to the latter portion, it's handed off to the second GPU. This is known as a dataflow architecture, and it's becoming a very popular way to scale AI processing.

I'm hoping to see more niche bots limited to specific knowledge fields (eg programming, health questions, etc) that can have lighter HW requirements, and thus be more viable running on consumer-grade PCs.
Don't count on it. Though the tech is advancing so fast that maybe someone will figure out a way to squeeze these models down enough that you can do it.
 
  • Like
Reactions: baboma
The 8-bit and 4-bit are supposed to be virtually the same quality, according to what I've read.
If today's models still work on the same general principles as what I've seen in an AI class I took a long time ago, signals usually pass through sigmoid functions to help them converge toward 0/1 or whatever numerical range limits the model layer operates on, so more resolution would only affect cases where rounding at higher precision would cause enough nodes to snap the other way and affect the output layer's outcome. When you have hundreds of inputs, most of the rounding noise should cancel itself out and not make much of a difference.

A bit weird by traditional math standards but it works.
 
  • Like
Reactions: JarredWaltonGPU
You can't run ChatGPT on a single GPU, but you can run some far less complex text generation large language models on your own PC. We tested oogabooga's text generation webui on several cards to see how fast it is and what sort of results you can expect.

How to Run a ChatGPT Alternative on Your Local PC : Read more
This process worked until I got to setup-py, although I had to run vcvars64.bat from C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\ rather than the provided directory.
The error I get when I attempt to setup cuda is as follows:
(llama4bit) E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install
running install
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info\PKG-INFO
writing dependency_links to quant_cuda.egg-info\dependency_links.txt
writing top-level names to quant_cuda.egg-info\top_level.txt
reading manifest file 'quant_cuda.egg-info\SOURCES.txt'
writing manifest file 'quant_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py:388: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'quant_cuda' extension
Emitting ninja build file E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\TH -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\ProgramData\miniconda3\envs\llama4bit\include -IC:\ProgramData\miniconda3\envs\llama4bit\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
FAILED: E:/llmRunner/text-generation-webui/repositories/GPTQ-for-LLaMa/build/temp.win-amd64-cpython-310/Release/quant_cuda_kernel.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\TH -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\ProgramData\miniconda3\envs\llama4bit\include -IC:\ProgramData\miniconda3\envs\llama4bit\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\pybind11\cast.h(624): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std:😛air, Ts=<T1, T2>]"
(721): here

C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\pybind11\cast.h(717): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std:😛air, Ts=<T1, T2>]"
(721): here

C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:😱perator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:😱perator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/core/TensorImpl.h(77): here

C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:😱perator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:😱perator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\ATen/core/qualified_name.h(73): here

2 errors detected in the compilation of "e:/llmrunner/text-generation-webui/repositories/gptq-for-llama/quant_cuda_kernel.cu".
quant_cuda_kernel.cu
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\setup_cuda.py", line 4, in <module>
setup(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\init.py", line 108, in setup
return distutils.core.setup(**attrs)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
return run_commands(dist)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
dist.run_commands()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 969, in run_commands
self.run_command(cmd)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py", line 74, in run
self.do_egg_install()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install
self.run_command('bdist_egg')
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
self.build()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\install_lib.py", line 111, in build
self.run_command('build_ext')
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
_build_ext.run(self)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 345, in run
self.build_extensions()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 548, in build_extension
objects = self.compiler.compile(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 815, in win_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

(llama4bit) E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa>
 
This process worked until I got to setup-py, although I had to run vcvars64.bat from C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\ rather than the provided directory.
The error I get when I attempt to setup cuda is as follows:
(llama4bit) E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install
running install
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info\PKG-INFO
writing dependency_links to quant_cuda.egg-info\dependency_links.txt
writing top-level names to quant_cuda.egg-info\top_level.txt
reading manifest file 'quant_cuda.egg-info\SOURCES.txt'
writing manifest file 'quant_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py:388: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'quant_cuda' extension
Emitting ninja build file E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\TH -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\ProgramData\miniconda3\envs\llama4bit\include -IC:\ProgramData\miniconda3\envs\llama4bit\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
FAILED: E:/llmRunner/text-generation-webui/repositories/GPTQ-for-LLaMa/build/temp.win-amd64-cpython-310/Release/quant_cuda_kernel.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\TH -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\ProgramData\miniconda3\envs\llama4bit\include -IC:\ProgramData\miniconda3\envs\llama4bit\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\pybind11\cast.h(624): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std:😛air, Ts=<T1, T2>]"
(721): here

C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\pybind11\cast.h(717): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std:😛air, Ts=<T1, T2>]"
(721): here

C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:😱perator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:😱perator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/core/TensorImpl.h(77): here

C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:😱perator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:😱perator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\ATen/core/qualified_name.h(73): here

2 errors detected in the compilation of "e:/llmrunner/text-generation-webui/repositories/gptq-for-llama/quant_cuda_kernel.cu".
quant_cuda_kernel.cu
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\setup_cuda.py", line 4, in <module>
setup(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\init.py", line 108, in setup
return distutils.core.setup(**attrs)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
return run_commands(dist)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
dist.run_commands()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 969, in run_commands
self.run_command(cmd)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py", line 74, in run
self.do_egg_install()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install
self.run_command('bdist_egg')
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
self.build()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\install_lib.py", line 111, in build
self.run_command('build_ext')
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
_build_ext.run(self)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 345, in run
self.build_extensions()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 548, in build_extension
objects = self.compiler.compile(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 815, in win_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

(llama4bit) E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa>
Interesting read
 
  • Like
Reactions: domih
[1/1] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc
The article says it uses CUDA 11.7.

And one of the linked instructions also say to use CUDA 11.7:

https://www.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_install_llama_8bit_and_4bit/

This:

C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\pybind11\cast.h(624): error: too few arguments for template template parameter "Tuple" detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]" (721): here
...is a C++ error that implies there's probably a version mismatch between two packages. That supports the idea that your CUDA version is too low, which we can also see by the fact that it happens while compiling a CUDA kernel:
2 errors detected in the compilation of "e:/llmrunner/text-generation-webui/repositories/gptq-for-llama/quant_cuda_kernel.cu".
 
If you're intending to work specifically with large models, you'll be extremely limited on a single-GPU consumer desktop. You might instead set aside the $ for renting time on Nvidia's A100 or H100 cloud instances.

Thanks for the advice, but I'm afraid the real bottleneck in this case is the human operator, aka yours truly. I'm still the wading pool phase--far, far from the point of needing timeshares on cloud services. It'll be a while.

Until then, hopefully the concept of edge computing will come to apply to AI, and AI-in-a-box will become more mainstream. As well, I'm confident that Nvidia won't be able to hog the whole AI accelerator market to itself for too long.

Given the competition between the tech giants for market share, there'll be several high-quality general-purpose chatbots with free access. No need to reinvent the wheel there, so my hope is for DIY bots to be aimed at niche (area-specific) uses.
 
I'm confident that Nvidia won't be able to hog the whole AI accelerator market to itself for too long.
Intel GPUs have pretty decent AI performance, at least for what they cost. Much more competitive on that front than gaming.

I also expect AMD to redouble their efforts to try and capture some meaningful AI marketshare. It's such a big and rapidly-growing market that they're barely tapping that they can't ignore it. And it's one of the few markets where they haven't made meaningful traction in the past few years.
 
This process worked until I got to setup-py, although I had to run vcvars64.bat from C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\ rather than the provided directory.
The error I get when I attempt to setup cuda is as follows:
(llama4bit) E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install
If Visual Studio is in Program Files (x86), that probably means you installed the 32-bit version. You might consider uninstalling that and getting the 64-bit version, as I never tested whether this stuff would work with a 32-bit environment. CUDA meanwhile indicates a version mismatch, which was one of the problems I encountered early on as well. If you run "conda list" it should show all the installed stuff for your current environment (i.e. after running "conda activate [whatever your called the environment]") Here's what I show on a working install, using the Reddit instructions (which specify CUDA 11.3):

Code:
(textgen) C:\Users\jwalt>conda list
# packages in environment at C:\Users\jwalt\miniconda3\envs\textgen:
#
# Name                    Version                   Build  Channel
accelerate                0.17.1                   pypi_0    pypi
aiofiles                  23.1.0                   pypi_0    pypi
aiohttp                   3.8.4                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
altair                    4.2.2                    pypi_0    pypi
anyio                     3.6.2                    pypi_0    pypi
async-timeout             4.0.2                    pypi_0    pypi
attrs                     22.2.0                   pypi_0    pypi
bitsandbytes              0.37.1                   pypi_0    pypi
bzip2                     1.0.8                he774522_0
ca-certificates           2023.01.10           haa95532_0
certifi                   2022.12.7       py310haa95532_0
charset-normalizer        3.1.0                    pypi_0    pypi
click                     8.1.3                    pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
contourpy                 1.0.7                    pypi_0    pypi
cuda                      11.3.0               hd997d6f_0    nvidia/label/cuda-11.3.0
cuda-command-line-tools   11.3.0               hd997d6f_0    nvidia/label/cuda-11.3.0
cuda-compiler             11.3.0               hd997d6f_0    nvidia/label/cuda-11.3.0
cuda-cudart               11.3.58              h24ea3a4_0    nvidia/label/cuda-11.3.0
cuda-cuobjdump            11.3.58              h9c7f84a_0    nvidia/label/cuda-11.3.0
cuda-cupti                11.3.58              h0481b1b_0    nvidia/label/cuda-11.3.0
cuda-cuxxfilt             11.3.58              hb382750_0    nvidia/label/cuda-11.3.0
cuda-libraries            11.3.0               hd997d6f_0    nvidia/label/cuda-11.3.0
cuda-libraries-dev        11.3.0               hd997d6f_0    nvidia/label/cuda-11.3.0
cuda-memcheck             11.3.58              h0838ec0_0    nvidia/label/cuda-11.3.0
cuda-nvcc                 11.3.58              hb8d16a4_0    nvidia/label/cuda-11.3.0
cuda-nvdisasm             11.3.58              h028471b_0    nvidia/label/cuda-11.3.0
cuda-nvml-dev             11.3.58              hbc9c638_0    nvidia/label/cuda-11.3.1
cuda-nvprof               11.3.58              h45e7c35_0    nvidia/label/cuda-11.3.0
cuda-nvprune              11.3.58              h42e8f5f_0    nvidia/label/cuda-11.3.0
cuda-nvrtc                11.3.58              h5d15f37_0    nvidia/label/cuda-11.3.0
cuda-nvtx                 11.3.58              h607cf41_0    nvidia/label/cuda-11.3.0
cuda-runtime              11.3.0               hd997d6f_0    nvidia/label/cuda-11.3.0
cuda-samples              11.3.58              h9a5194a_0    nvidia/label/cuda-11.3.1
cuda-sanitizer-api        11.3.58              h5192ad9_0    nvidia/label/cuda-11.3.0
cuda-thrust               11.3.58              hc445dc0_0    nvidia/label/cuda-11.3.0
cuda-toolkit              11.3.0               hd997d6f_0    nvidia/label/cuda-11.3.0
cuda-tools                11.3.0               hd997d6f_0    nvidia/label/cuda-11.3.0
cycler                    0.11.0                   pypi_0    pypi
entrypoints               0.4                      pypi_0    pypi
fastapi                   0.94.1                   pypi_0    pypi
ffmpy                     0.3.0                    pypi_0    pypi
filelock                  3.10.0                   pypi_0    pypi
flexgen                   0.1.7                    pypi_0    pypi
fonttools                 4.39.2                   pypi_0    pypi
frozenlist                1.3.3                    pypi_0    pypi
fsspec                    2023.3.0                 pypi_0    pypi
git                       2.34.1               haa95532_0
gradio                    3.18.0                   pypi_0    pypi
h11                       0.14.0                   pypi_0    pypi
httpcore                  0.16.3                   pypi_0    pypi
httpx                     0.23.3                   pypi_0    pypi
huggingface-hub           0.13.2                   pypi_0    pypi
idna                      3.4                      pypi_0    pypi
jinja2                    3.1.2                    pypi_0    pypi
jsonschema                4.17.3                   pypi_0    pypi
kiwisolver                1.4.4                    pypi_0    pypi
libcublas                 11.4.2.10064         hdce621a_0    nvidia/label/cuda-11.3.0
libcufft                  10.4.2.58            ha8d0324_0    nvidia/label/cuda-11.3.0
libcurand                 10.2.4.58            h205e5ba_0    nvidia/label/cuda-11.3.0
libcusolver               11.1.1.58            h8fce944_0    nvidia/label/cuda-11.3.0
libcusparse               11.5.0.58            h26ccba6_0    nvidia/label/cuda-11.3.0
libffi                    3.4.2                hd77b12b_6
libnpp                    11.3.3.44            h8a18219_0    nvidia/label/cuda-11.3.0
libnvjpeg                 11.4.1.58            h1234a80_0    nvidia/label/cuda-11.3.0
linkify-it-py             2.0.0                    pypi_0    pypi
markdown                  3.4.1                    pypi_0    pypi
markdown-it-py            2.2.0                    pypi_0    pypi
markupsafe                2.1.2                    pypi_0    pypi
matplotlib                3.7.1                    pypi_0    pypi
mdit-py-plugins           0.3.5                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
multidict                 6.0.4                    pypi_0    pypi
networkx                  3.0                      pypi_0    pypi
ninja                     1.11.1                   pypi_0    pypi
numpy                     1.24.2                   pypi_0    pypi
openssl                   1.1.1t               h2bbff1b_0
orjson                    3.8.7                    pypi_0    pypi
packaging                 23.0                     pypi_0    pypi
pandas                    1.5.3                    pypi_0    pypi
peft                      0.2.0                    pypi_0    pypi
pillow                    9.4.0                    pypi_0    pypi
pip                       23.0.1          py310haa95532_0
psutil                    5.9.4                    pypi_0    pypi
pulp                      2.7.0                    pypi_0    pypi
pycryptodome              3.17                     pypi_0    pypi
pydantic                  1.10.6                   pypi_0    pypi
pydub                     0.25.1                   pypi_0    pypi
pyparsing                 3.0.9                    pypi_0    pypi
pyrsistent                0.19.3                   pypi_0    pypi
python                    3.10.9               h966fe2a_2
python-dateutil           2.8.2                    pypi_0    pypi
python-multipart          0.0.6                    pypi_0    pypi
pytz                      2022.7.1                 pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
quant-cuda                0.0.0                    pypi_0    pypi
regex                     2022.10.31               pypi_0    pypi
requests                  2.28.2                   pypi_0    pypi
rfc3986                   1.5.0                    pypi_0    pypi
safetensors               0.3.0                    pypi_0    pypi
sentencepiece             0.1.97                   pypi_0    pypi
setuptools                65.6.3          py310haa95532_0
six                       1.16.0                   pypi_0    pypi
sniffio                   1.3.0                    pypi_0    pypi
sqlite                    3.41.1               h2bbff1b_0
starlette                 0.26.1                   pypi_0    pypi
sympy                     1.11.1                   pypi_0    pypi
tk                        8.6.12               h2bbff1b_0
tokenizers                0.13.2                   pypi_0    pypi
toolz                     0.12.0                   pypi_0    pypi
torch                     1.12.0+cu113             pypi_0    pypi
tqdm                      4.65.0                   pypi_0    pypi
transformers              4.28.0.dev0              pypi_0    pypi
typing-extensions         4.5.0                    pypi_0    pypi
tzdata                    2022g                h04d1e81_0
uc-micro-py               1.0.1                    pypi_0    pypi
urllib3                   1.26.15                  pypi_0    pypi
uvicorn                   0.21.1                   pypi_0    pypi
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
websockets                10.4                     pypi_0    pypi
wheel                     0.38.4          py310haa95532_0
wincertstore              0.2             py310haa95532_2
xz                        5.2.10               h8cc25b3_1
yarl                      1.8.2                    pypi_0    pypi
zlib                      1.2.13               h8cc25b3_0

Here's the same list, only for an environment using my version of the instructions:

Code:
(llama4bit) C:\Users\jwalt>conda list
# packages in environment at C:\Users\jwalt\miniconda3\envs\llama4bit:
#
# Name                    Version                   Build  Channel
accelerate                0.17.1                   pypi_0    pypi
aiofiles                  23.1.0                   pypi_0    pypi
aiohttp                   3.8.4                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
altair                    4.2.2                    pypi_0    pypi
anyio                     3.6.2                    pypi_0    pypi
async-timeout             4.0.2                    pypi_0    pypi
attrs                     22.2.0                   pypi_0    pypi
bitsandbytes              0.37.1                   pypi_0    pypi
blas                      1.0                         mkl
brotlipy                  0.7.0           py310h2bbff1b_1002
bzip2                     1.0.8                he774522_0
ca-certificates           2023.01.10           haa95532_0
cchardet                  2.1.7                    pypi_0    pypi
certifi                   2022.12.7       py310haa95532_0
cffi                      1.15.1          py310h2bbff1b_3
chardet                   4.0.0           py310haa95532_1003
charset-normalizer        3.1.0                    pypi_0    pypi
click                     8.1.3                    pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
contourpy                 1.0.7                    pypi_0    pypi
cryptography              39.0.1          py310h21b164f_0
cuda                      11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-cccl                 12.1.55                       0    nvidia
cuda-command-line-tools   11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-compiler             11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-cudart               11.7.99                       0    nvidia
cuda-cudart-dev           11.7.99                       0    nvidia
cuda-cuobjdump            11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-cupti                11.7.101                      0    nvidia
cuda-cuxxfilt             11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-demo-suite           11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-documentation        11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-libraries            11.7.1                        0    nvidia
cuda-libraries-dev        11.7.1                        0    nvidia
cuda-memcheck             11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-nsight-compute       11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-nvcc                 11.7.64                       0    nvidia/label/cuda-11.7.0
cuda-nvdisasm             11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-nvml-dev             11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-nvprof               11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-nvprune              11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-nvrtc                11.7.99                       0    nvidia
cuda-nvrtc-dev            11.7.99                       0    nvidia
cuda-nvtx                 11.7.91                       0    nvidia
cuda-nvvp                 11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-runtime              11.7.1                        0    nvidia
cuda-sanitizer-api        11.7.50                       0    nvidia/label/cuda-11.7.0
cuda-toolkit              11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-tools                11.7.0                        0    nvidia/label/cuda-11.7.0
cuda-visual-tools         11.7.0                        0    nvidia/label/cuda-11.7.0
cycler                    0.11.0                   pypi_0    pypi
entrypoints               0.4                      pypi_0    pypi
fastapi                   0.94.1                   pypi_0    pypi
ffmpy                     0.3.0                    pypi_0    pypi
filelock                  3.9.1                    pypi_0    pypi
flexgen                   0.1.7                    pypi_0    pypi
flit-core                 3.6.0              pyhd3eb1b0_0
fonttools                 4.39.0                   pypi_0    pypi
freetype                  2.12.1               ha860e81_0
frozenlist                1.3.3                    pypi_0    pypi
fsspec                    2023.3.0                 pypi_0    pypi
giflib                    5.2.1                h8cc25b3_3
git                       2.34.1               haa95532_0
gradio                    3.18.0                   pypi_0    pypi
h11                       0.14.0                   pypi_0    pypi
httpcore                  0.16.3                   pypi_0    pypi
httpx                     0.23.3                   pypi_0    pypi
huggingface-hub           0.13.2                   pypi_0    pypi
idna                      3.4             py310haa95532_0
intel-openmp              2021.4.0          haa95532_3556
jinja2                    3.1.2           py310haa95532_0
jpeg                      9e                   h2bbff1b_1
jsonschema                4.17.3                   pypi_0    pypi
kiwisolver                1.4.4                    pypi_0    pypi
lerc                      3.0                  hd77b12b_0
libcublas                 11.10.3.66                    0    nvidia
libcublas-dev             11.10.3.66                    0    nvidia
libcufft                  10.7.2.124                    0    nvidia
libcufft-dev              10.7.2.124                    0    nvidia
libcurand                 10.3.2.56                     0    nvidia
libcurand-dev             10.3.2.56                     0    nvidia
libcusolver               11.4.0.1                      0    nvidia
libcusolver-dev           11.4.0.1                      0    nvidia
libcusparse               11.7.4.91                     0    nvidia
libcusparse-dev           11.7.4.91                     0    nvidia
libdeflate                1.17                 h2bbff1b_0
libffi                    3.4.2                hd77b12b_6
libnpp                    11.7.4.75                     0    nvidia
libnpp-dev                11.7.4.75                     0    nvidia
libnvjpeg                 11.8.0.2                      0    nvidia
libnvjpeg-dev             11.8.0.2                      0    nvidia
libpng                    1.6.39               h8cc25b3_0
libtiff                   4.5.0                h6c2663c_2
libuv                     1.44.2               h2bbff1b_0
libwebp                   1.2.4                hbc33d0d_1
libwebp-base              1.2.4                h2bbff1b_1
linkify-it-py             2.0.0                    pypi_0    pypi
lz4-c                     1.9.4                h2bbff1b_0
markdown                  3.4.1                    pypi_0    pypi
markdown-it-py            2.2.0                    pypi_0    pypi
markupsafe                2.1.2                    pypi_0    pypi
matplotlib                3.7.1                    pypi_0    pypi
mdit-py-plugins           0.3.5                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
mkl                       2021.4.0           haa95532_640
mkl-service               2.4.0           py310h2bbff1b_0
mkl_fft                   1.3.1           py310ha0764ea_0
mkl_random                1.2.2           py310h4ed8f06_0
mpmath                    1.2.1           py310haa95532_0
multidict                 6.0.4                    pypi_0    pypi
networkx                  2.8.4           py310haa95532_0
ninja                     1.10.2               haa95532_5
ninja-base                1.10.2               h6d14046_5
nsight-compute            2022.2.0.13                   0    nvidia/label/cuda-11.7.0
numpy                     1.24.2                   pypi_0    pypi
numpy-base                1.23.5          py310h04254f7_0
openssl                   1.1.1t               h2bbff1b_0
orjson                    3.8.7                    pypi_0    pypi
packaging                 23.0                     pypi_0    pypi
pandas                    1.5.3                    pypi_0    pypi
pillow                    9.4.0                    pypi_0    pypi
pip                       23.0.1          py310haa95532_0
psutil                    5.9.4                    pypi_0    pypi
pulp                      2.7.0                    pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0
pycryptodome              3.17                     pypi_0    pypi
pydantic                  1.10.6                   pypi_0    pypi
pydub                     0.25.1                   pypi_0    pypi
pyopenssl                 23.0.0          py310haa95532_0
pyparsing                 3.0.9                    pypi_0    pypi
pyrsistent                0.19.3                   pypi_0    pypi
pysocks                   1.7.1           py310haa95532_0
python                    3.10.9               h966fe2a_2
python-dateutil           2.8.2                    pypi_0    pypi
python-multipart          0.0.6                    pypi_0    pypi
pytorch                   2.0.0           py3.10_cuda11.7_cudnn8_0    pytorch
pytorch-cuda              11.7                 h16d0643_3    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pytz                      2022.7.1                 pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
quant-cuda                0.0.0                    pypi_0    pypi
regex                     2022.10.31               pypi_0    pypi
requests                  2.28.2                   pypi_0    pypi
rfc3986                   1.5.0                    pypi_0    pypi
rwkv                      0.4.2                    pypi_0    pypi
safetensors               0.3.0                    pypi_0    pypi
sentencepiece             0.1.97                   pypi_0    pypi
setuptools                65.6.3          py310haa95532_0
six                       1.16.0             pyhd3eb1b0_1
sniffio                   1.3.0                    pypi_0    pypi
sqlite                    3.41.1               h2bbff1b_0
starlette                 0.26.1                   pypi_0    pypi
sympy                     1.11.1          py310haa95532_0
tk                        8.6.12               h2bbff1b_0
tokenizers                0.13.2                   pypi_0    pypi
toolz                     0.12.0                   pypi_0    pypi
torch                     1.13.1                   pypi_0    pypi
torchaudio                2.0.0                    pypi_0    pypi
torchvision               0.15.0                   pypi_0    pypi
tqdm                      4.65.0                   pypi_0    pypi
transformers              4.27.0.dev0              pypi_0    pypi
typing-extensions         4.5.0                    pypi_0    pypi
typing_extensions         4.4.0           py310haa95532_0
tzdata                    2022g                h04d1e81_0
uc-micro-py               1.0.1                    pypi_0    pypi
urllib3                   1.26.15                  pypi_0    pypi
uvicorn                   0.21.0                   pypi_0    pypi
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
websockets                10.4                     pypi_0    pypi
wheel                     0.38.4          py310haa95532_0
win_inet_pton             1.1.0           py310haa95532_0
wincertstore              0.2             py310haa95532_2
xz                        5.2.10               h8cc25b3_1
yarl                      1.8.2                    pypi_0    pypi
zlib                      1.2.13               h8cc25b3_0
zstd                      1.5.2                h19a0ad4_0

If you have wrong CUDA versions, you could try doing "conda remove [library]" and then go back to the "conda install cuda [etc]" statement and see if that helps.
 
  • Like
Reactions: bit_user
Hello,

Anyone have idea what caused this error? I tried reinstalling everything but I always get to the end and get this error.

(llama4bit) PS C:\AIStuff\text-generation-webui> python server.py --gptq-bits 4 --model llama-7b
Loading llama-7b...
Traceback (most recent call last):
File "C:\AIStuff\text-generation-webui\server.py", line 241, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\AIStuff\text-generation-webui\modules\models.py", line 101, in load_model
model = load_quantized(model_name)
File "C:\AIStuff\text-generation-webui\modules\GPTQ_loader.py", line 56, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'
(llama4bit) PS C:\AIStuff\text-generation-webui>

If you have idea how can I fix this please let me know.
 
Hello,

Anyone have idea what caused this error? I tried reinstalling everything but I always get to the end and get this error.

(llama4bit) PS C:\AIStuff\text-generation-webui> python server.py --gptq-bits 4 --model llama-7b
Loading llama-7b...
Traceback (most recent call last):
File "C:\AIStuff\text-generation-webui\server.py", line 241, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\AIStuff\text-generation-webui\modules\models.py", line 101, in load_model
model = load_quantized(model_name)
File "C:\AIStuff\text-generation-webui\modules\GPTQ_loader.py", line 56, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'
(llama4bit) PS C:\AIStuff\text-generation-webui>

If you have idea how can I fix this please let me know.
Looks like potentially either the download was corrupted, or maybe the configuration file has an error. Have you tried the Reddit steps? If not, create a new Conda environment and see if those work. (You can copy over the llama-7b files, or if you want just redownload to see if the error persists.)
 
  • Like
Reactions: Paulii6661
Looks like potentially either the download was corrupted, or maybe the configuration file has an error. Have you tried the Reddit steps? If not, create a new Conda environment and see if those work. (You can copy over the llama-7b files, or if you want just redownload to see if the error persists.)

Hello,

Today I downloaded it for at least 5 times so probably not a download corruption. I tried different folders I reinstalled conda I tried 7b and 13b files. But every time I try to run 4bit version it just give me this error. In other location on my disk I have non 4bit version and it works.. I can't find anything on the Internet so I tried to write here because by following that tutorial everything worked with 0 errors except when trying to start it.
 
No, it's more like a version mismatch. GPTQ_loader.py clearly added a groupsize parameter, at some point, and models.py simply isn't setting it.

You might do better by going back a couple revisions. Unfortunately, I don't know enough about conda to provide any further guidance.

Hello sir,

So basically you are suggesting to get older GPTQ version? I will try that and update my post.
 
Hello sir,

So basically you are suggesting to get older GPTQ version? I will try that and update my post.
That could work. What GPU are you trying to run this on? Anyway, you might try this command (from the folder where you did "https://github.com/oobabooga/text-generation-webui.git"):

git checkout 'master@{2023-03-18 18:30:00}'

In theory, that will get you the versions of the files from last Saturday.
 
I tried it several times, with the exact instruction, and always get:
"running build_exterror: [WinError 2]..."

When I run 'python setup_cuda.py install' a shame, I was hoping for finally being able to get 4-bit running. :'(

I have the 2019 Build tool version installed, and miniconda, etc. CUDA says True when it is queried from Python, so everything seems ok, but still the error. T_T
 
That could work. What GPU are you trying to run this on? Anyway, you might try this command (from the folder where you did "https://github.com/oobabooga/text-generation-webui.git"):

git checkout 'master@{2023-03-18 18:30:00}'

In theory, that will get you the versions of the files from last Saturday.

Hello, thank you for your reply.
I tried your command from all main folders but it says "fatal: not a git repository (or any of the parent directories): .git" could you please provide more information?

By the way I'm trying to run it on RTX 2070.