Bug Report: Issues Building DeepSpeed on Windows #5679

Moemu · 2024-06-18T07:33:06Z

Description:

I encountered some issues while building DeepSpeed on Windows systems. The generation process failed, it indicates that the folder already exists.

Environment:

OS: Windows 11
Python Version: 3.11
Conda Environment: Yes
DeepSpeed Version: lastest
CUDA Version: 12.3
PyTorch Version: 2.3.1+cu121

Steps to Reproduce:

Clone the DeepSpeed repository.
Navigate to the DeepSpeed directory.
Run the build script: build_win.bat

Error Log:

 (Neuro) C:\Muice-Vtuber\Neuro-master\DeepSpeed>build_win.bat
DS_BUILD_OPS=1
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_lion requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_lion requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Install Ops={'async_io': False, 'fused_adam': 1, 'cpu_adam': 1, 'cpu_adagrad': 1, 'cpu_lion': 1, 'evoformer_attn': False, 'fp_quantizer': False, 'fused_lamb': 1, 'fused_lion': 1, 'inference_core_ops': False, 'cutlass_ops': False, 'transformer_inference': False, 'quantizer': 1, 'ragged_device_ops': False, 'ragged_ops': 1, 'random_ltd': 1, 'sparse_attn': False, 'spatial_inference': 1, 'transformer': 1, 'stochastic_transformer': 1}
Traceback (most recent call last):
  File "C:\Muice-Vtuber\Neuro-master\DeepSpeed\setup.py", line 212, in <module>
    shutil.copytree('.\\csrc', '.\\deepspeed\\ops')
  File "C:\Users\Moemu\.conda\envs\Neuro\Lib\shutil.py", line 560, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Moemu\.conda\envs\Neuro\Lib\shutil.py", line 459, in _copytree
    os.makedirs(dst, exist_ok=dirs_exist_ok)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [WinError 183] 当文件已存在时，无法创建该文件。: '.\\deepspeed\\ops'

The text was updated successfully, but these errors were encountered:

ycsgg · 2024-06-18T12:49:42Z

try to replace

shutil.copytree('.\\csrc', '.\\deepspeed\\ops') 
shutil.copytree('.\\op_builder', '.\\deepspeed\\ops')

with

shutil.copytree('.\\csrc', '.\\deepspeed\\ops\\csrc') 
shutil.copytree('.\\op_builder', '.\\deepspeed\\ops\\op_builder')

But I'm not sure if this will work well

Moemu · 2024-06-18T16:04:21Z

Thanks you. At the same time, I deleted files (.\\deepspeed\\accelerator, .\\deepspeed\\ops\\csrc and .\\deepspeed\\ops\\op_builder) and it could work.

But I met a new error :(

creating build\lib.win-amd64-cpython-311\deepspeed\inference\v2\ragged\csrc
copying deepspeed\inference\v2\ragged\csrc\fast_host_buffer.cu -> build\lib.win-amd64-cpython-311\deepspeed\inference\v2\ragged\csrc
copying deepspeed\inference\v2\ragged\csrc\ragged_ops.cpp -> build\lib.win-amd64-cpython-311\deepspeed\inference\v2\ragged\csrc
copying deepspeed\ops\sparse_attention\trsrc\matmul.tr -> build\lib.win-amd64-cpython-311\deepspeed\ops\sparse_attention\trsrc
copying deepspeed\ops\sparse_attention\trsrc\softmax_bwd.tr -> build\lib.win-amd64-cpython-311\deepspeed\ops\sparse_attention\trsrc
copying deepspeed\ops\sparse_attention\trsrc\softmax_fwd.tr -> build\lib.win-amd64-cpython-311\deepspeed\ops\sparse_attention\trsrc
running build_ext
C:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\utils\cpp_extension.py:418: UserWarning: The detected CUDA version (12.3) has a minor version mismatch with the version that was used to compile PyTorch (12.1). Most likely this shouldn't be a problem.
  warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'deepspeed.ops.adam.fused_adam_op' extension
creating build\temp.win-amd64-cpython-311
creating build\temp.win-amd64-cpython-311\Release
creating build\temp.win-amd64-cpython-311\Release\csrc
creating build\temp.win-amd64-cpython-311\Release\csrc\adam
"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\Hostx64\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Muice-Vtuber\Neuro-master\DeepSpeed-master\csrc\includes -IC:\Muice-Vtuber\Neuro-master\DeepSpeed-master\csrc\adam -IC:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\include -IC:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\include\TH -IC:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include" -IC:\Users\Moemu\.conda\envs\Neuro\include -IC:\Users\Moemu\.conda\envs\Neuro\Include /EHsc /Tpcsrc/adam/fused_adam_frontend.cpp /Fobuild\temp.win-amd64-cpython-311\Release\csrc/adam/fused_adam_frontend.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -O2 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_adam_op -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++17
fused_adam_frontend.cpp
C:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\include\c10/core/DeviceType.h(10): fatal error C1083: 无法打开 包括文件: “cstddef”: No such file or directory
error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.40.33807\\bin\\Hostx64\\x64\\cl.exe' failed with exit code 2

costin-eseanu · 2024-06-18T18:29:10Z

@Moemu, it looks like MSVC can't find cstddef, which is a standard C++ include file. Please make sure to run build_win.bat from a "Developer Command Prompt for VS 2022" which sets the correct environment variables for the compiler. In addition, you can build the costineseanu/windows_inference_build branch which has more fixes for the Windows build (including the one about not being able to copy files).

ChangxingJiang · 2024-06-19T00:22:19Z

Thank you. Change the code in setup.py and deletes the 3 files could work. I find this change commit in #5596 and shutil.copytree cannot cover the exists file.

captainsuperman79 · 2024-09-25T13:02:41Z

Description:

I encountered some issues while building DeepSpeed on Windows systems. The generation process failed, it indicates that the folder already exists.

Environment:

OS: Windows 11
Python Version: 3.11
Conda Environment: Yes
DeepSpeed Version: lastest
CUDA Version: 12.3
PyTorch Version: 2.3.1+cu121

Steps to Reproduce:

Clone the DeepSpeed repository.
Navigate to the DeepSpeed directory.
Run the build script: build_win.bat

Error Log:

 (Neuro) C:\Muice-Vtuber\Neuro-master\DeepSpeed>build_win.bat
DS_BUILD_OPS=1
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_lion requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_lion requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Install Ops={'async_io': False, 'fused_adam': 1, 'cpu_adam': 1, 'cpu_adagrad': 1, 'cpu_lion': 1, 'evoformer_attn': False, 'fp_quantizer': False, 'fused_lamb': 1, 'fused_lion': 1, 'inference_core_ops': False, 'cutlass_ops': False, 'transformer_inference': False, 'quantizer': 1, 'ragged_device_ops': False, 'ragged_ops': 1, 'random_ltd': 1, 'sparse_attn': False, 'spatial_inference': 1, 'transformer': 1, 'stochastic_transformer': 1}
Traceback (most recent call last):
  File "C:\Muice-Vtuber\Neuro-master\DeepSpeed\setup.py", line 212, in <module>
    shutil.copytree('.\\csrc', '.\\deepspeed\\ops')
  File "C:\Users\Moemu\.conda\envs\Neuro\Lib\shutil.py", line 560, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Moemu\.conda\envs\Neuro\Lib\shutil.py", line 459, in _copytree
    os.makedirs(dst, exist_ok=dirs_exist_ok)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [WinError 183] 当文件已存在时，无法创建该文件。: '.\\deepspeed\\ops'

loadams assigned costin-eseanu Jun 18, 2024

Moemu closed this as completed Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: Issues Building DeepSpeed on Windows #5679

Bug Report: Issues Building DeepSpeed on Windows #5679

Moemu commented Jun 18, 2024 •

edited

Loading

ycsgg commented Jun 18, 2024

Moemu commented Jun 18, 2024

costin-eseanu commented Jun 18, 2024

ChangxingJiang commented Jun 19, 2024

captainsuperman79 commented Sep 25, 2024

Description:

Environment:

Steps to Reproduce:

Error Log:

Bug Report: Issues Building DeepSpeed on Windows #5679

Bug Report: Issues Building DeepSpeed on Windows #5679

Comments

Moemu commented Jun 18, 2024 • edited Loading

Description:

Environment:

Steps to Reproduce:

Error Log:

ycsgg commented Jun 18, 2024

Moemu commented Jun 18, 2024

costin-eseanu commented Jun 18, 2024

ChangxingJiang commented Jun 19, 2024

captainsuperman79 commented Sep 25, 2024

Description:

Environment:

Steps to Reproduce:

Error Log:

Moemu commented Jun 18, 2024 •

edited

Loading