-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFdiffusion-1.1.0-foss-2022a-CUDA-11.7.0.eb with models and missing dependencies #18359
base: develop
Are you sure you want to change the base?
RFdiffusion-1.1.0-foss-2022a-CUDA-11.7.0.eb with models and missing dependencies #18359
Conversation
…n/1.5.0-foss-2022a
Test report by @lexming |
|
easybuild/easyconfigs/r/RFdiffusion/RFdiffusion-1.1.0-foss-2022a.eb
Outdated
Show resolved
Hide resolved
se3-transformer/model/basis.py imports torch.cuda.nvtx
model checksum exit
comment models urls and checksums
{'dgl-1.1.1.tar.gz': '076026a7818f2396056252269d7960c561840bb0fe89494e11f8d6c524891c49'}, | ||
{'metis-5.1.1-DistDGL-v0.5.tar.gz': 'cedf0b32d32a8496bac7eb078b2b8260fb00ddb8d50c27e4082968a01bc33331'}, | ||
{'GKlib-METIS-v5.1.1-DistDGL-0.5.tar.gz': '52aa0d383d42360f4faa0ae9537ba2ca348eeab4db5f2dfd6343192d0ff4b833'}, | ||
{'tensorpipe-20230206.tar.gz': '97dcc6ab648d38f62ddc9d2a33de2a789c967e43e819481f817341881912a524'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checksums for git checkouts are not stable, so should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smoors I was not aware of the unstable checksums. Is this the case for any commit or only for recursive checkouts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the case whenever you are not downloading a tarball directly, but creating the tarball yourself, so any git clone
remove unstable checksums for tensorpipe, pcg, and libxsmm
remove unstable checksums for tensorpipe, pcg, and libxsmm
update cub comment
easybuild/easyconfigs/r/RFdiffusion/RFdiffusion-1.1.0-foss-2022a-CUDA-11.7.0.eb
Outdated
Show resolved
Hide resolved
# uncomment for installing models and schedules as components. | ||
# Then comment out _get_schedules and _get_models in postinstallcmds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between the components and _get_schedules/_get_models?
i would prefer to use the best method and remove the other one to keep things clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smoors Using components would archive the models and would allow to reinstall RFdiffusion even if the models were not available for download any more. I'd prefer this option since RFdiffusion is not usable without the models.
However, I understand @lexming's concerns about to store the models data (3.9GB) at least twice, in the module directory and the sources directory.
Therefore I kept the components approach commented out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which solution do you prefer and which should I delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, i agree with you that having to re-download on reinstall is suboptimal, but i also agree with @lexming that storing the models twice is suboptimal.
a possible solution is to create a custom easyblock that checks the installdir and does not delete/re-download the models if they are already present and if their checksums are correct.
to make this configurable, we could define an extra parameter in the easyblock, e.g. keep_models_if_installed
.
this is similar to easyblock Tarball
, which allows merging an installation into an existing installation instead of wiping it first.
what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For long term it is probably better to place the data into a separate module RFdiffusion_data, which can be reused for minor updates of RFdiffusion, if the data does not change.
It might be good to have a generic easyblock for such data modules, because we can expect more of them with all the upcoming AI packages. (see also #19157 Model-Angelo_data, #19240 relion-classranker_data + relion-blush_data).
There was also a brief discussion at https://github.com/easybuilders/easybuild/wiki/Conference-call-notes-20230719
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's indeed a better solution long-term.
i'll try to give a stab at a generic data easyblock, but feel free to do this yourself if you're up for it.
in the meantime, i suggest to already go ahead with creating a separate easyconfig for RFdiffusion_data. you should be able to use the Bundle
easyblock (which does not install anything), and download the models in a postinstallcmds
, as you have done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a look on RFdiffusion_data
in the next few days. RFdiffusion-1.1.0_data_paths.patch
requires an update allowing to read the models and schedules from elsewhere defined in a environment variable.
Hi! I juts start to work on vscentrum/vsc-software-stack#283 pipeline of RFdiffusion - ProteinMPNN -AlphaFold. I'd like to use RFd with data models either. Do you working on that or you stop? |
@pavelToman I am currently working on AlphaFold/2.3.2-foss-2023a-CUDA-12.1.1. see also #19841 |
|
||
sanity_check_commands = [ | ||
"run_inference.py --help", | ||
"run_inference.py 'contigmap.contigs=[150-150]' " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command will create two output directories in %(installdir)s, is there a way to avoid this?
], | ||
}), | ||
] | ||
|
||
sanity_pip_check = True | ||
postinstallcmds = [ | ||
'cp -rpP config examples %s/%%(namelower)s' % _pysp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'cp -rpP config examples %s/%%(namelower)s' % _pysp, | |
'cp -rpP config examples helper_scripts %s/%%(namelower)s' % _pysp, | |
'cp -pP %%(namelower)s/inference/sym_rots.npz %s/%%(namelower)s/inference/' % _pysp, |
RFdiffusion needs sym_rots.npz to perform symmetry operations which is needed to use different symmetry settings.
It seems the source |
(created using
eb --new-pr
)