This process worked until I got to setup-py, although I had to run vcvars64.bat from C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\ rather than the provided directory.
The error I get when I attempt to setup cuda is as follows:
(llama4bit) E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install
running install
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info\PKG-INFO
writing dependency_links to quant_cuda.egg-info\dependency_links.txt
writing top-level names to quant_cuda.egg-info\top_level.txt
reading manifest file 'quant_cuda.egg-info\SOURCES.txt'
writing manifest file 'quant_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py:388: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'quant_cuda' extension
Emitting ninja build file E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\TH -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\ProgramData\miniconda3\envs\llama4bit\include -IC:\ProgramData\miniconda3\envs\llama4bit\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
FAILED: E:/llmRunner/text-generation-webui/repositories/GPTQ-for-LLaMa/build/temp.win-amd64-cpython-310/Release/quant_cuda_kernel.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\TH -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\ProgramData\miniconda3\envs\llama4bit\include -IC:\ProgramData\miniconda3\envs\llama4bit\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\pybind11\cast.h(624): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std:
air, Ts=<T1, T2>]"
(721): here
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\pybind11\cast.h(717): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std:
air, Ts=<T1, T2>]"
(721): here
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:
perator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:
perator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/core/TensorImpl.h(77): here
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:
perator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>:
perator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=true, <unnamed>=0]"
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\ATen/core/qualified_name.h(73): here
2 errors detected in the compilation of "e:/llmrunner/text-generation-webui/repositories/gptq-for-llama/quant_cuda_kernel.cu".
quant_cuda_kernel.cu
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\setup_cuda.py", line 4, in <module>
setup(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\
init.py", line 108, in setup
return distutils.core.setup(**attrs)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
return run_commands(dist)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
dist.run_commands()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 969, in run_commands
self.run_command(cmd)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py", line 74, in run
self.do_egg_install()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install
self.run_command('bdist_egg')
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
self.build()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\install_lib.py", line 111, in build
self.run_command('build_ext')
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\dist.py", line 1221, in run_command
super().run_command(command)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
cmd_obj.run()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
_build_ext.run(self)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 345, in run
self.build_extensions()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 548, in build_extension
objects = self.compiler.compile(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 815, in win_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
(llama4bit) E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa>