Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{bio}[foss/2021a] OpenFold v1.0.0, colossalai v0.1.8, einops v0.4.1, OpenMM 7.5.1 (incl. AlphaFold patch) w/ Python 3.9.5 + CUDA 11.3.1 #15971

Conversation

migueldiascosta
Copy link
Member

@migueldiascosta migueldiascosta commented Aug 5, 2022

(created using eb --new-pr)

requires:

…s-0.4.1-GCCcore-10.3.0.eb, OpenFold-1.0.0-foss-2021a-CUDA-11.3.1.eb, OpenMM-7.5.1-foss-2021a_AlphaFold.eb and patches: OpenMM-7.5.1_AlphaFold.patch
@migueldiascosta

This comment was marked as outdated.

@boegel boegel added this to the 4.x milestone Aug 5, 2022
@boegel boegel changed the title {bio}[foss/2021a] colossalai v0.1.8, einops v0.4.1, OpenFold v1.0.0, ... w/ Python 3.9.5 {bio}[foss/2021a] colossalai v0.1.8, einops v0.4.1, OpenFold v1.0.0, ... w/ Python 3.9.5 + CUDA 11.3.1 Aug 5, 2022
@boegel boegel changed the title {bio}[foss/2021a] colossalai v0.1.8, einops v0.4.1, OpenFold v1.0.0, ... w/ Python 3.9.5 + CUDA 11.3.1 {bio}[foss/2021a] OpenFold v1.0.0, colossalai v0.1.8, einops v0.4.1, OpenMM 7.5.1 (incl. AlphaFold patch) w/ Python 3.9.5 + CUDA 11.3.1 Aug 5, 2022
@migueldiascosta

This comment was marked as outdated.

@boegelbot

This comment was marked as outdated.

@boegelbot

This comment was marked as outdated.

@boegel

This comment was marked as outdated.

@boegel boegel modified the milestones: 4.x, next release (4.6.1?) Aug 6, 2022
@migueldiascosta
Copy link
Member Author

(weird, couldn't remove the patch with old name with --update-pr, ERROR: Path doesn't exist or file to delete isn't found in target branch: OpenMM-7.5.1_AlphaFold.patch, had to use git directly...)

Micket
Micket previously requested changes Aug 9, 2022
Comment on lines +47 to +49
('ninja', '1.10.2.3', {
'checksums': ['e1b86ad50d4e681a7dbdff05fc23bb52cb773edb90bc428efba33fa027738408'],
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need that specific version? We have Ninja-1.10.1-GCCcore-10.2.0.eb which is this thing (to be used as a builddep only i assume?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that provides only the ninja binary, and deepspeed requires the ninja python package...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could enhance the (recent) Ninja easyconfigs to also install the Python bindings (I'm looking into that, PR coming soon...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #16025

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the ninja Python package, provided by https://github.com/scikit-build/ninja-python-distributions, is actually a shim package, a very light-weight wrapper around the ninja binary so you can declare a dependency on it in setup.py & co...
With that in mind, I think it should be OK to just strip out the requirement for ninja as long as we provide the traditional Ninja as a (build) depemdency.

I'll look into this (and then close #16025 since that PR doesn't make much sense then)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, just stripping out the requirement for the ninja Python package won't be a good idea, since deepspeed really does require it (if only to check whether Ninja is available), see https://github.com/microsoft/DeepSpeed/blob/316c4a43e0802a979951ee17f735daf77ea9780f/deepspeed/env_report.py#L54-L59

So unless we can somehow make the ninja Python package point to an existing Ninja installation rather than having it install it's own ninja binary, this may be a necessary evil that's hard to avoid... :-/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I closed #16025, since that's clearly not the correct way forward.

Since the ninja Python package is also a runtime dependency for deepspeed, I don't see a better way out than the current approach being used here: install ninja as an extension in OpenFold (as opposed to trying to use the classic Ninja installation as a dependency somehow).

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel boegel dismissed Micket’s stale review August 25, 2022 19:15

The ninja Python package is a necessary evil, can't seem to make it sit on top of an existing Ninja tool installation

@boegel
Copy link
Member

boegel commented Aug 26, 2022

@boegelbot please test @ jsc-zen2

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=15971 EB_ARGS= /opt/software/slurm/bin/sbatch --job-name test_PR_15971 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 1582

Test results coming soon (I hope)...

- notification for comment with ID 1228155272 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/1d708e7beb5796c6bb85fea81b8917f9 for a full test report.

@boegel
Copy link
Member

boegel commented Aug 26, 2022

Test report by @boegel
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node3900.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 510.73.08, Python 3.6.8
See https://gist.github.com/a68659b2c90a6bc02728d0f3b7e01dfe for a full test report.

@boegel
Copy link
Member

boegel commented Aug 26, 2022

Going in, thanks @migueldiascosta!

@boegel boegel merged commit 808836f into easybuilders:develop Aug 26, 2022
@boegel
Copy link
Member

boegel commented Aug 26, 2022

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=15971 EB_ARGS= /opt/software/slurm/bin/sbatch --job-name test_PR_15971 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9063

Test results coming soon (I hope)...

- notification for comment with ID 1228240829 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/ac6796a4b86ea9ac390abea84bbb6414 for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants