Updated Python and PyMC, removed TensorFlow, and added PyTorch in conda environment. #8561

samuelklee · 2023-10-23T18:30:59Z

Copying over some discussion from Slack, with some slight modifications:

I took a quick stab at updating the environment for gCNV. Even taking out TensorFlow (assuming that the CNN will not be supported by this environment), it's a difficult task:

The goal is to update Python from 3.6 to 3.10+, since Terra now requires the latter for officially supported images.

However, gCNV relies on the PyMC3 package. PyMC3 3.1 is currently used in GATK master. 3.1 was released in 2017, not long before our release of gCNV in 2018, but it's very old now.

The latest version of Python that is supported by PyMC3 3.1 in conda is Python 3.6.

@asmirnov239 has a draft PR (Add pytorch to the conda environment #8094) that updates PyMC3 to 3.5 and Python to 3.7, which clearly still falls short of Python 3.10+. This PR also updated some gCNV code to make it compatible with PyMC3 3.5. (It also removed TensorFlow and added PyTorch.)

@asmirnov239 also merged a PR that added tests for numerical reproducibility of GermlineCNVCaller in cohort mode in Added gCNV integration test to detect numerical differences in the outputs. #7889.

The earliest version of PyMC that supports Python 3.10+ is PyMC 4, released in 2022.

However, PyMC 4 introduces API changes, which will also require additional gCNV code changes and numerical testing.

These API changes are because the underlying computational backend for PyMC was updated from Theano (think of this as an old alternative to TensorFlow) to Aesara.

Since then, PyMC 5.9 has been released and the underlying backend has been updated again, from Aesara to PyTensor.

So if we are going to update the environment to support Python 3.10+, it probably makes sense to go all the way to PyMC 5.9.

I've made some strides in this PR; as of 6b08f3a, I've made enough updates to accommodate API changes so that cohort-mode inference for both GermlineCNVCaller and DetermineGermlineContigPloidy runs successfully under Python 3.10 and PyMC 5.9.0---although note that 5.9.1 has been released in the interim!

However, our work has just begun. Results now produced in the numerical tests mentioned above are quite far off from the original expected results. It remains to be seen whether this is due to the randomness of inference, some slight changes to the model prior that were necessitated by the API changes, or some bugs introduced in other code updates. (Also note that I believe Andrey's PR in item 4 already broke these tests, although the numerical differences were much smaller and more reasonable---but perhaps he can confirm. Also noting here that I think determinism is still currently broken as of this commit---there have been some changes to PyTensor/PyMC seeding so that our previous theano/PyMC3 hack no longer applies.)

So I think the next step is to just go to scientific-level testing and see what the fallout is. Ideally, we'd still get good performance (or perhaps better! at least on the runtime side, hopefully...) and we can just update the numerical tests. But if performance tanks, then we might need to see whether I've introduced any bugs. @mwalker174 @asmirnov239 perhaps you can comment on what might be the appropriate test suite here----1kGP?

I'll also highlight again that this PR will remove TensorFlow and might require that the corresponding CNN implementations be supported by an alternate strategy, at least until the PyTorch implementation goes in.

mwalker174 · 2023-11-09T21:17:07Z

Thanks for your work on this @samuelklee! Testing on both wes and wgs would be ideal. For wgs we can use the gatk-sv reference panel, which is our standard (I can help with this once a docker is ready). For wes, 1kgp would work although it's definitely showing its age. Are the integration test differences large?

samuelklee · 2023-12-08T17:16:20Z

OK, I think things are looking good! Updated a bunch of things, including the following:

conda 23.1.0 -> 23.10.0; in the base Docker, also disabled conda auto-updating and set the solver to the much faster libmamba (NOTE: before this PR went in, this change was actually made in Update the GATK base image to a newer LTS ubuntu release #8610)
python 3.6.10 -> 3.10.13
pymc 3.1 -> 5.10.0
theano 1.0.4 -> pytensor 2.18.1
added pytorch 2.1.0
removed tensorflow 1.15.0 and other CNN dependencies
added libblas-dev to the base Docker; I think MKL versions of all packages are being used, but we should verify!

These and other packages (numpy, scipy, etc.) are all pretty much at the latest available versions for python 3.10. I've also bumped version numbers for our internal python packages.

I also made all of the changes to the gCNV code to accommodate any changes introduced by PyMC/Pytensor. For the most part, these were minor renamings of theano/tt/etc. to pytensor/pt/etc.

However, there were some more nontrivial changes, including to 1) model priors (since some of the distributions previously used were removed or are now supported differently), 2) the implementation of posterior sampling, 3) some shape/dimshuffle operations, and other things along these lines.

Using a single test shard of 20 1kGP WES samples x 1000 intervals, I have verified determinism/reproducibility for DetermineGermlineContigPloidy COHORT/CASE modes, GermlineCNVCaller COHORT/CASE modes, and PostprocessGermlineCNVCalls. Numerical results are also relatively close to those from 4.4.0.0 for all identifiable call and model quantities (albeit far outside any reasonable exact-match thresholds, most likely due to differences in RNG, sampling, and the aforementioned priors).

Some remaining TODOs:

Rebuild and push the base Docker. EDIT: Mostly covered by Update the GATK base image to a newer LTS ubuntu release #8610, but this also includes an addition of libblas-dev.
Update expected results for integration tests, perhaps add any that might be missing. EDIT: These were generated on WSL Ubuntu 20.04.2, we'll see if things pass on 22.04. Note that changing the ARD priors does change the names of the expected files, since the transform is appended to the corresponding variable name. DetermineGermlineContigPloidy and PostprocessGermlineCNVCalls are missing exact-match tests and should probably have some, but I'll leave that to someone else.
Update other python integration tests.
Clean up some of the changes to the priors.
Clean up some TODO comments that I left to track code changes that might result in changed numerics. I'll try to go through and convert these to PR comments in an initial review pass.
Test over multiple shards on WGS and WES. Probably some scientific tests on ~100 samples in both cohort and case mode would do the trick. We should also double check runtime/memory performance (I noted ~1.5x speedups, but didn't measure carefully; I also want to make sure the changes to posterior sampling didn't introduce any memory issues). @mwalker174 will ping you when a Docker is ready! Might be good to loop in Isaac and/or Jack as well.
Perhaps add back the fix for 2-interval shards in Number of intervals edge case gCNV fix #8180, which I removed since the required functionality wasn't immediately available in Pytensor. Not sure if this actually broke things though---need to check. (However, I don't actually think this is a very important use case to support...)
Delete/deprecate/etc. CNN tools/tests as appropriate. Note that this has to be done concurrently, since we remove Tensorflow. @droazen perhaps I can take a first stab at this in a subsequent commit to this PR once more of the gCNV dust settles and/or has undergone a preliminary review? EDIT: Disabled integration/WDL tests. We should add some deprecation messages to the tools---we can note that they should still work in previous environments but will be untested. I might set up a separate PR for deletion, to be done at the appropriate time (but I call dibs on this, can't have @davidbenjamin overtaking my all-time record for number of lines deleted 😛).

gatk-bot · 2023-12-08T17:17:10Z

Github actions tests reported job failures from actions build 7143821808
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
conda	17.0.6+10	7143821808.3	logs

matthdsm · 2024-07-01T11:22:21Z

Hi all,

Any chance this will make it into a release soon? I was hoping this got merged with the recent docker image overhaul.

Thanks
Matthias

samuelklee · 2024-07-01T19:36:10Z

@matthdsm this was intentionally left out of the recent 4.6 release, but should go into the next minor release. Would of course appreciate any testing/feedback from the community before then!

samuelklee · 2024-07-02T04:25:27Z

Released gatkbase-3.3.0 to broadinstitute/gatk:gatkbase-3.3.0, but getting Permission "artifactregistry.repositories.uploadArtifacts" denied on resource "projects/broad-gatk/locations/us/repositories/us.gcr.io" when trying to push to us.gcr.io/broad-gatk/gatk:gatkbase-3.3.0.

samuelklee · 2024-07-09T15:56:07Z

Just added @DeprecatedFeature tags to the CNN tools. @droazen will help me push broadinstitute/gatk:gatkbase-3.3.0 to us.gcr.io/broad-gatk/gatk:gatkbase-3.3.0 (since it appears I no longer have permission, perhaps due to the recent migration). Then a thumbs up from him or @ldgauthier and I think this is good to go in!

…ker and conda environments. Notable environment changes: python 3.6.10 -> 3.10.13, pymc 3.1 -> 5.10.0, theano 1.0.4 -> pytensor 2.18.1, added pytorch 2.1.0, removed tensorflow 1.15.0 and other CNN dependencies, added libblas-dev to the base Docker.

…Tensor.

…d CNNVariantWriteTensors.

ldgauthier

All the comparisons look great and I am confident in David's CNN->NV update plan -- let's do it!

droazen · 2024-07-09T20:19:50Z

Woohoo, thank you @samuelklee !!

matthdsm · 2024-07-10T05:59:28Z

@droazen, do you think this warrants a new point release? That way we can finally fix the gatk-gcnvkernel recipe over at bioconda and make the conda recipe useable again 😄

droazen · 2024-07-10T21:09:08Z

@matthdsm Yes definitely -- there will be another release fairly soon to get this out. Before we can release, though, we do need to merge a couple of PRs that have been waiting on this change (in particular, a replacement tool for CNNScoreVariants that uses PyTorch). We're currently targeting the late July / early August timeframe for the next release.

Are you the maintainer of the GATK bioconda recipes, by the way? Let us know if there's anything else we can do in the upcoming release to fix bioconda-related issues!

matthdsm · 2024-07-12T11:35:37Z

I'm a bioconda maintainer, one of many, but I've got a vested interest in a functional gatk recipe 😅
At the moment, we're unable to get the latest version of the GATK to build because of the requirements for the gcnvkernel.
A new version with the changes above would fix most if not all of the issues we're currently seeing.

matthdsm · 2024-08-22T12:52:17Z

Hi @droazen,
Any updates on the timeframe for this new release? We're eagerly waiting for the next version so we can start updating everything on our side!

samuelklee marked this pull request as draft October 23, 2023 18:31

samuelklee changed the title ~~Sl python version update~~ Updated Python and PyMC, removed TensorFlow, and added PyTorch in conda environment. Oct 23, 2023

This comment was marked as outdated.

Sign in to view

samuelklee force-pushed the sl_python_version_update branch 2 times, most recently from 6534430 to 558ccaf Compare November 9, 2023 20:48

This comment was marked as outdated.

Sign in to view

This was referenced Dec 10, 2023

Update the GATK base image to a newer LTS ubuntu release #8610

Merged

GermlineCNVCaller different results with same GATK and different Ubuntu #8619

Closed

samuelklee force-pushed the sl_python_version_update branch from 4bf5286 to ed59372 Compare December 12, 2023 14:55

This comment was marked as outdated.

Sign in to view

matthdsm mentioned this pull request Jul 1, 2024

Update gatk4 to 4.6.1.0 bioconda/bioconda-recipes#48815

Merged

samuelklee force-pushed the sl_python_version_update branch from f7a9760 to 4dc5caa Compare July 1, 2024 16:11

samuelklee marked this pull request as ready for review July 2, 2024 04:11

samuelklee added 9 commits July 9, 2024 12:07

Updated gCNV code to account for changes from PyMC3/Theano to PyMC/Py…

49e7779

…Tensor.

Updated gCNV integration tests.

7f5ddb6

Updated gCNV WDL tests.

19a69d6

Updated other tests and tools affected by environment changes.

0121f15

Reverted posterior sampling to online implementation.

cce7a37

Updated localDevCondaEnv task in build.gradle.

f2f3229

Addressed review comments and cleaned up TODOs.

66580ce

Released gatkbase-3.3.0 and updated Dockerfile.

e62596b

samuelklee force-pushed the sl_python_version_update branch from 0470fde to e389682 Compare July 9, 2024 16:09

Added DeprecatedFeature tags to CNNScoreVariants, CNNVariantTrain, an…

17a350a

…d CNNVariantWriteTensors.

samuelklee force-pushed the sl_python_version_update branch from e389682 to 17a350a Compare July 9, 2024 16:11

This comment was marked as resolved.

Sign in to view

ldgauthier approved these changes Jul 9, 2024

View reviewed changes

samuelklee merged commit ddaf66f into master Jul 9, 2024
20 checks passed

samuelklee deleted the sl_python_version_update branch July 9, 2024 20:08

samuelklee mentioned this pull request Jul 9, 2024

Follow up on CNN deprecation done in the update to python 3.10. #8907

Open

Updated Python and PyMC, removed TensorFlow, and added PyTorch in conda environment. #8561

Updated Python and PyMC, removed TensorFlow, and added PyTorch in conda environment. #8561

Conversation

samuelklee commented Oct 23, 2023 • edited Loading

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

mwalker174 commented Nov 9, 2023

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

samuelklee commented Dec 8, 2023 • edited Loading

gatk-bot commented Dec 8, 2023

This comment was marked as outdated.

This comment was marked as outdated.

matthdsm commented Jul 1, 2024

samuelklee commented Jul 1, 2024

samuelklee commented Jul 2, 2024 • edited Loading

samuelklee commented Jul 9, 2024

This comment was marked as resolved.

This comment was marked as resolved.

ldgauthier left a comment

Choose a reason for hiding this comment

droazen commented Jul 9, 2024

matthdsm commented Jul 10, 2024

droazen commented Jul 10, 2024

matthdsm commented Jul 12, 2024

matthdsm commented Aug 22, 2024

samuelklee commented Oct 23, 2023 •

edited

Loading

samuelklee commented Dec 8, 2023 •

edited

Loading

samuelklee commented Jul 2, 2024 •

edited

Loading