Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Issues Building DeepSpeed on Windows #5679

Closed
Moemu opened this issue Jun 18, 2024 · 5 comments
Closed

Bug Report: Issues Building DeepSpeed on Windows #5679

Moemu opened this issue Jun 18, 2024 · 5 comments
Assignees

Comments

@Moemu
Copy link

Moemu commented Jun 18, 2024

Description:

I encountered some issues while building DeepSpeed on Windows systems. The generation process failed, it indicates that the folder already exists.

Environment:

OS: Windows 11
Python Version: 3.11
Conda Environment: Yes
DeepSpeed Version: lastest
CUDA Version: 12.3
PyTorch Version: 2.3.1+cu121

Steps to Reproduce:

Clone the DeepSpeed repository.
Navigate to the DeepSpeed directory.
Run the build script: build_win.bat

Error Log:

 (Neuro) C:\Muice-Vtuber\Neuro-master\DeepSpeed>build_win.bat
DS_BUILD_OPS=1
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_lion requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_lion requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Install Ops={'async_io': False, 'fused_adam': 1, 'cpu_adam': 1, 'cpu_adagrad': 1, 'cpu_lion': 1, 'evoformer_attn': False, 'fp_quantizer': False, 'fused_lamb': 1, 'fused_lion': 1, 'inference_core_ops': False, 'cutlass_ops': False, 'transformer_inference': False, 'quantizer': 1, 'ragged_device_ops': False, 'ragged_ops': 1, 'random_ltd': 1, 'sparse_attn': False, 'spatial_inference': 1, 'transformer': 1, 'stochastic_transformer': 1}
Traceback (most recent call last):
  File "C:\Muice-Vtuber\Neuro-master\DeepSpeed\setup.py", line 212, in <module>
    shutil.copytree('.\\csrc', '.\\deepspeed\\ops')
  File "C:\Users\Moemu\.conda\envs\Neuro\Lib\shutil.py", line 560, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Moemu\.conda\envs\Neuro\Lib\shutil.py", line 459, in _copytree
    os.makedirs(dst, exist_ok=dirs_exist_ok)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [WinError 183] 当文件已存在时,无法创建该文件。: '.\\deepspeed\\ops'
@ycsgg
Copy link

ycsgg commented Jun 18, 2024

try to replace

shutil.copytree('.\\csrc', '.\\deepspeed\\ops') 
shutil.copytree('.\\op_builder', '.\\deepspeed\\ops')

with

shutil.copytree('.\\csrc', '.\\deepspeed\\ops\\csrc') 
shutil.copytree('.\\op_builder', '.\\deepspeed\\ops\\op_builder')

But I'm not sure if this will work well

@Moemu
Copy link
Author

Moemu commented Jun 18, 2024

Thanks you. At the same time, I deleted files (.\\deepspeed\\accelerator, .\\deepspeed\\ops\\csrc and .\\deepspeed\\ops\\op_builder) and it could work.

But I met a new error :(

creating build\lib.win-amd64-cpython-311\deepspeed\inference\v2\ragged\csrc
copying deepspeed\inference\v2\ragged\csrc\fast_host_buffer.cu -> build\lib.win-amd64-cpython-311\deepspeed\inference\v2\ragged\csrc
copying deepspeed\inference\v2\ragged\csrc\ragged_ops.cpp -> build\lib.win-amd64-cpython-311\deepspeed\inference\v2\ragged\csrc
copying deepspeed\ops\sparse_attention\trsrc\matmul.tr -> build\lib.win-amd64-cpython-311\deepspeed\ops\sparse_attention\trsrc
copying deepspeed\ops\sparse_attention\trsrc\softmax_bwd.tr -> build\lib.win-amd64-cpython-311\deepspeed\ops\sparse_attention\trsrc
copying deepspeed\ops\sparse_attention\trsrc\softmax_fwd.tr -> build\lib.win-amd64-cpython-311\deepspeed\ops\sparse_attention\trsrc
running build_ext
C:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\utils\cpp_extension.py:418: UserWarning: The detected CUDA version (12.3) has a minor version mismatch with the version that was used to compile PyTorch (12.1). Most likely this shouldn't be a problem.
  warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'deepspeed.ops.adam.fused_adam_op' extension
creating build\temp.win-amd64-cpython-311
creating build\temp.win-amd64-cpython-311\Release
creating build\temp.win-amd64-cpython-311\Release\csrc
creating build\temp.win-amd64-cpython-311\Release\csrc\adam
"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.40.33807\bin\Hostx64\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Muice-Vtuber\Neuro-master\DeepSpeed-master\csrc\includes -IC:\Muice-Vtuber\Neuro-master\DeepSpeed-master\csrc\adam -IC:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\include -IC:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\include\TH -IC:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include" -IC:\Users\Moemu\.conda\envs\Neuro\include -IC:\Users\Moemu\.conda\envs\Neuro\Include /EHsc /Tpcsrc/adam/fused_adam_frontend.cpp /Fobuild\temp.win-amd64-cpython-311\Release\csrc/adam/fused_adam_frontend.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -O2 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_adam_op -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++17
fused_adam_frontend.cpp
C:\Users\Moemu\.conda\envs\Neuro\Lib\site-packages\torch\include\c10/core/DeviceType.h(10): fatal error C1083: 无法打开 包括文件: “cstddef”: No such file or directory
error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.40.33807\\bin\\Hostx64\\x64\\cl.exe' failed with exit code 2

@costin-eseanu
Copy link
Contributor

@Moemu, it looks like MSVC can't find cstddef, which is a standard C++ include file. Please make sure to run build_win.bat from a "Developer Command Prompt for VS 2022" which sets the correct environment variables for the compiler. In addition, you can build the costineseanu/windows_inference_build branch which has more fixes for the Windows build (including the one about not being able to copy files).

@ChangxingJiang
Copy link

Thank you. Change the code in setup.py and deletes the 3 files could work. I find this change commit in #5596 and shutil.copytree cannot cover the exists file.

@Moemu Moemu closed this as completed Jun 19, 2024
@captainsuperman79
Copy link

Description:

I encountered some issues while building DeepSpeed on Windows systems. The generation process failed, it indicates that the folder already exists.

Environment:

OS: Windows 11
Python Version: 3.11
Conda Environment: Yes
DeepSpeed Version: lastest
CUDA Version: 12.3
PyTorch Version: 2.3.1+cu121

Steps to Reproduce:

Clone the DeepSpeed repository.
Navigate to the DeepSpeed directory.
Run the build script: build_win.bat

Error Log:

 (Neuro) C:\Muice-Vtuber\Neuro-master\DeepSpeed>build_win.bat
DS_BUILD_OPS=1
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
系统找不到指定的文件。
 [WARNING]  cpu_lion requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
系统找不到指定的文件。
 [WARNING]  cpu_lion requires the 'lscpu' command, but it does not exist!
 [WARNING]  cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Installed CUDA version 12.3 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination
Install Ops={'async_io': False, 'fused_adam': 1, 'cpu_adam': 1, 'cpu_adagrad': 1, 'cpu_lion': 1, 'evoformer_attn': False, 'fp_quantizer': False, 'fused_lamb': 1, 'fused_lion': 1, 'inference_core_ops': False, 'cutlass_ops': False, 'transformer_inference': False, 'quantizer': 1, 'ragged_device_ops': False, 'ragged_ops': 1, 'random_ltd': 1, 'sparse_attn': False, 'spatial_inference': 1, 'transformer': 1, 'stochastic_transformer': 1}
Traceback (most recent call last):
  File "C:\Muice-Vtuber\Neuro-master\DeepSpeed\setup.py", line 212, in <module>
    shutil.copytree('.\\csrc', '.\\deepspeed\\ops')
  File "C:\Users\Moemu\.conda\envs\Neuro\Lib\shutil.py", line 560, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Moemu\.conda\envs\Neuro\Lib\shutil.py", line 459, in _copytree
    os.makedirs(dst, exist_ok=dirs_exist_ok)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [WinError 183] 当文件已存在时,无法创建该文件。: '.\\deepspeed\\ops'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants