Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] unit test failures on Deepspeed upstream #56

Open
bmedishe opened this issue Mar 18, 2022 · 0 comments
Open

[BUG] unit test failures on Deepspeed upstream #56

bmedishe opened this issue Mar 18, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@bmedishe
Copy link

Error Log :
=========================== short test summary info ============================
FAILED tests/unit/test_checkpointing.py::test_checkpoint_moe[4]
FAILED tests/unit/test_checkpointing.py::test_checkpoint_moe_and_zero[4-True]
FAILED tests/unit/test_checkpointing.py::test_checkpoint_moe_and_zero[2-True]
FAILED tests/unit/test_configurable_parallel.py::TestConfigurableMP::test_gpt2_basic
====== 4 failed, 581 passed, 58 skipped, 1 warning in 3850.22s (1:04:10) =======
Steps to reproduce :
Follow the steps in this PR to install pytorch with hipify_torch as submodule
After building and installing pytorch from source , clone DeepSpeed from upstream and do a jit build and run unit tests:

  1. git clone https://github.com/microsoft/DeepSpeed.git
  2. #include<THC/THCGeneral.h> from csrc/lamb/fused_lamb_cuda_kernel.cu removed before building
  3. ./install.sh (JIT build)
  4. DEEPSPEED_TEST_WITH_ROCM=1 pytest --forked tests/unit/test_* 2>&1 | tee deepspeed_unit_test
@bmedishe bmedishe added the bug Something isn't working label Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants