-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing Archaea summary when using ANI screen #508
Comments
Hi, |
Here is the content of The 3 Archaean genomes are present inside the 164 genomes identified by the ANI Screen.
|
Hi, the FastANI/pplacer step should always return 0, but it does not in this log file:
Would you mind sending the genomes you are trying to analyse? |
Do you need the full set or mainly a mix of those archaean and bacterian genomes assigned with the ani_screen and pplacer ? |
Mainly a mix of bacteria and Archaea classified with FastANI/pplacer and the Archaea genomes missing in the summary file Thanks |
Hi, Here is a link of tar.gz archive with 10 MAGs: 3 archaeans, 4 ani_screen bacterial species and 3 bacterial species identified by pplacer. |
Hi, I've used the set of genomes I've sent to redo the classify analysis and got the same issues with the archaean genomes Here is the structure of my output directory :
and the corresponding gtdb.log
It should help to simplify the investigation. Did you manage to reproduce the issue ? |
Hi , I think I have found the issue , but I need to run extras checks. |
Hi, in your first log file you had the following lines:
I am interested to know what are these 4 genomes and why they are not classified with the first ANI step. unfortunately, they don't seem to be in the 10 genomes provided. Thank you |
Hi Pierre, Thanks for the patch. I am checking for those 4 genomes but I have 68 out the 269 input genomes that have the "taxonomic classification defined by topology and ANI" classification_method. I'll send those to you once I've figured them out. |
Hi, I've managed to find the genomes by looking into
I've rerun GTDB-tk on it and it still behave in the same way
Here are the MAGs producing the weird behavior |
Hello, |
Hello,
I've tested the new ANI screen method, using the mash DB, for
classify_wf
.I've observed that I am missing a few genome at each run as well as the summary output
gtdbtl.ar53.summary.tsv
. The missing outputs correspond to Archaean genomes that were identified during the ANI screen as I can find them inclassify/ani_screen/gtdbtk.ar53.ani_summary.tsv
.I guess implementation of the ANI screen missed the Archaea part of the pipeline ?
I am using version 2.2.6 in a conda environment created by installing GTDB-Tk from bioconda but I guess it is installation independent. Here is my command :
gtdbtk classify_wf --mash_db ./GTDB/gtdb-tk-r207v2.msh --genome_dir ./ALL/ -x fasta --out_dir gtdbtk2_classify --cpus 18 --pplacer_cpus 18 --tmpdir ./tmp --scratch_dir ./pplacer
As it is a bit related, I was wondering if it was possible to consolidate the Archean et Bacterial summary into an unique output for the future release ?
Thank you.
The text was updated successfully, but these errors were encountered: