You can't run ChatGPT on a single GPU, but you can run some far less complex text generation large language models on your own PC. We tested oogabooga's text generation webui on several cards to see how fast it is and what sort of results you can expect.
How to Run a ChatGPT Alternative on Your Local PC : Read more
This process worked until I got to setup-py, although I had to run vcvars64.bat from C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\ rather than the provided directory.
The error I get when I attempt to setup cuda is as follows:
(llama4bit) E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install
running install
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info\PKG-INFO
writing dependency_links to quant_cuda.egg-info\dependency_links.txt
writing top-level names to quant_cuda.egg-info\top_level.txt
reading manifest file 'quant_cuda.egg-info\SOURCES.txt'
writing manifest file 'quant_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py:388: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'quant_cuda' extension
Emitting ninja build file E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\TH -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\ProgramData\miniconda3\envs\llama4bit\include -IC:\ProgramData\miniconda3\envs\llama4bit\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
FAILED: E:/llmRunner/text-generation-webui/repositories/GPTQ-for-LLaMa/build/temp.win-amd64-cpython-310/Release/quant_cuda_kernel.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\TH -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\ProgramData\miniconda3\envs\llama4bit\include -IC:\ProgramData\miniconda3\envs\llama4bit\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\pybind11\cast.h(624): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std:
😛air, Ts=<T1, T2>]"
(721): here
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\pybind11\cast.h(717): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std:
😛air, Ts=<T1, T2>]"
(721): here
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:
😱perator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:
😱perator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/core/TensorImpl.h(77): here
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:
😱perator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:
😱perator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\ATen/core/qualified_name.h(73): here
2 errors detected in the compilation of "e:/llmrunner/text-generation-webui/repositories/gptq-for-llama/quant_cuda_kernel.cu".
quant_cuda_kernel.cu
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\setup_cuda.py", line 4, in <module>
setup(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\
init.py", line 108, in setup
return distutils.core.setup(**attrs)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
return run_commands(dist)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
dist.run_commands()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 969, in run_commands
self.run_command(cmd)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py", line 74, in run
self.do_egg_install()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install
self.run_command('bdist_egg')
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
self.build()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\install_lib.py", line 111, in build
self.run_command('build_ext')
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
_build_ext.run(self)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 345, in run
self.build_extensions()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 548, in build_extension
objects = self.compiler.compile(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 815, in win_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
(llama4bit) E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa>