Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add patches to fix PyTorch 1.13.1 w/ foss/2022a on POWER + fix flaky test_jit_legacy test #18500

Merged

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Aug 8, 2023

(created using eb --new-pr)

Adds the updated patch from #18489 and another one similar to #18490

@boegel boegel changed the title fix PyTorch-1.13.1-foss-2022a on POWER add patches to fix PyTorch-1.13.1-foss-2022a on POWER + fix flaky test_jit_legacy test Aug 8, 2023
@boegel boegel added this to the next release (4.8.1?) milestone Aug 8, 2023
@boegel boegel added the bug fix label Aug 8, 2023
@boegel
Copy link
Member

boegel commented Aug 8, 2023

@boegelbot please test @ generoso
CORE_CNT=16

@boegel boegel changed the title add patches to fix PyTorch-1.13.1-foss-2022a on POWER + fix flaky test_jit_legacy test add patches to fix PyTorch 1.13.1 w/ foss/2022a on POWER + fix flaky test_jit_legacy test Aug 8, 2023
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=18500 EB_ARGS= EB_CONTAINER= /opt/software/slurm/bin/sbatch --job-name test_PR_18500 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 11402

Test results coming soon (I hope)...

- notification for comment with ID 1669763131 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member

boegel commented Aug 8, 2023

@boegelbot please test @ jsc-zen2
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=18500 EB_ARGS= /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_18500 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3102

Test results coming soon (I hope)...

- notification for comment with ID 1669879185 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/f4a6220b1a0027746b7241a9d79e7ce2 for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/14afd4b3cc4c786d60815eba5c99292e for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusml24 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/653574b428b9ab6ae556aef59c2c2d16 for a full test report.

@boegel
Copy link
Member

boegel commented Aug 9, 2023

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3115.skitty.os - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/06c529b35468c5489f6b6d98b313ed14 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusi8031 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/Flamefire/2ae197940ca00755731d0950d6464c9d for a full test report.

@boegel
Copy link
Member

boegel commented Aug 9, 2023

@Flamefire Last test report failed due to a lock being found, I won't let that block this PR since there's various other tests on Intel/AMD

@boegel
Copy link
Member

boegel commented Aug 9, 2023

Going in, thanks @Flamefire!

@boegel boegel merged commit b193d32 into easybuilders:develop Aug 9, 2023
@Flamefire Flamefire deleted the 20230808155858_new_pr_PyTorch1131 branch August 9, 2023 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants