Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop full suite of tests for manual execution #501

Open
5 of 15 tasks
jfy133 opened this issue Sep 1, 2023 · 6 comments
Open
5 of 15 tasks

Develop full suite of tests for manual execution #501

jfy133 opened this issue Sep 1, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@jfy133
Copy link
Member

jfy133 commented Sep 1, 2023

Description of feature

A major problem we currently have during development is our CI tests are nowhere near comprehensive enough due to the pipeline utilising extremely large database files that do not fit in GHA resource allocations.

We should develop and document a suite of manual tests developers should run on their own infrastructure to ensure the pipeline is indeed working as intended.

mag missing configs and tests

For Automated CI

For manual CI

Does not need a database
Datbases on AWS
  • Config five (shared with below)
    • CAT
    • GTDB
Databases NOT on AWS
  • Config five (shared with above)
    • CheckM (in CI but not in a config)
    • GUNC
    • Metaeuk
@prototaxites
Copy link
Contributor

Metaeuk

For MetaEuk, specifying params.metaeuk_mmseqs_db = "UniProtKB/Swiss-Prot" only entails downloading a small database - doing a quick check, the fasta it's based on is only 87Mb. So that should potentially be feasible to run more automatedly?

@jfy133
Copy link
Member Author

jfy133 commented Feb 16, 2024

@prototaxites

Yeah that definitely should be feasible! Is it a single file with a public URL?

@prototaxites
Copy link
Contributor

@prototaxites

Yeah that definitely should be feasible! Is it a single file with a public URL?

"UniProtKB/Swiss-Prot" is the string passed to the mmseqs databases command, which downloads the latest release of the database AFAIK. Now that I think about it, I'm not sure there's a way to specify a version, unfortunately, which limits reproducibility.

Alternative would be to specify the URL of a fasta file to --metaeuk_db - in the MetaEuk module test, I passed it the yeast .faa in the test-data repo: https://github.com/nf-core/modules/blob/master/tests/modules/nf-core/metaeuk/easypredict/main.nf, which seemed to work OK, but it might be better to find a prokaryotic file to use with the test data.

@jfy133
Copy link
Member Author

jfy133 commented May 23, 2024

List of tools that need to be somehow covered, where they are covered in currently:

tool config comment
adapterremoval test_adapterremoval maybe could be moved into ancient-dna, as they are people who mostly use it?
aria2 NONE used with checkm
bbmap/bbnorm test_bbnorm Short test
bcftools test_ancient_dna
cat test
checkm NONE
centrifuge test
concoct test_bin_refinement / test_concoct test_conoct: everything else deactivated due to very long run time
dastool test_binrefinement / test_ancient_dna
fastp test
fastqc test
freebayes test_ancient_dna
genomad test_virus_identification everything else turned off (necessary?)
gtdbtk NONE large db (make mini?)
gunc NONE does it have a large db?
gunzip test
krona test
maxbin test
metabat2 test
metaeuk test_adapterremoval
mmseqs NONE only if --metaeuk_mmseqs_db is supplied
multiqc test
prodigal test
prokka test
pydamage test_ancient_dna
samtools test_ancient_dna
seqtk test_bbnorm Short test
tiara test_adapterremoval note has a special DASTOOL_FASTATOCONTIGBIN_TIARA process that doesn't actually run DASTOOL!
bowtie2 (phix) test
bowtie2 (host) test_host_rm / test_hubrid_host_rm
bowtie2 (assembly) test
busco test
CAT NONE large db (make mini?)
filtlong test_hybrid / test_hybrid_rm
kraken2 test
megahit test
spades test
spadeshybrid test_hybrid / test_hybrid_rm
nanolyse test_hybrid / test_hybrid_rm
nanoplot test_hybrid / test_hybrid_rm
porechop test_hybrid / test_hybrid_rm
quast test
tiara test_adapter_removal

Additional:

context config
samplesheet input test
assembly input tesT_bin_Refinement

@jfy133
Copy link
Member Author

jfy133 commented May 30, 2024

Proposal:

name description tools done
test default (incl. those that run pre-assembly if db supplied (skip metaeuk?) except concoct centrifuge, kraken2, krona yes
test_single_end test but with single end input (as current, as only skips steps where reads not needed) yes
test_alternatives all alternative tools adapterremoval, checkm, bin_domain_classification yes
test_preassembly_binrefine genomad, concoct, binning refinement (gunc), metaeuk conoct, gunc, metaeuk
test_hybrid_rm for long read, w/host remove
test_nothing everything off Yes
test_extras standard test but with additional opt-in functionality keep_phix, bbnorm, host_rm, genomad, ancient_dna, tiara,
test_bigdb tools with big databases (mini versions: CAT/GTDBK)
test_full as current

@CarsonJM
Copy link
Contributor

@jfy133 this is fantastic! The old structure of tests was very confusing 😅

@jfy133 jfy133 mentioned this issue Jun 22, 2024
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants