Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for input genome file not having line return after last input genome (renamed by Mike) #75

Open
Calvin2077 opened this issue Apr 13, 2023 · 5 comments

Comments

@Calvin2077
Copy link

Hello!

I recently discovered your GToTree and have found it super helpful for my master's project and your clear instillation and instructions have been a huge help in getting it too work on my laptop so thank you very much.

I did a practice run of my species and one was dropped due to the redundancy being greater than 10%. As I am needing to include all my species for my project if there a way to increase the threshold of redundancy when using amino acids fasta files?

Thanks

@AstrobioMike
Copy link
Owner

Hey there, @Calvin2077!

Thanks for the kind words :)

A genome shouldn’t be dropped due to the redundancy estimate, that’s just a notice. Are you sure it’s not in the final tree? If not, it may be getting dropped for not enough target genes being found, which we can adjust

@Calvin2077
Copy link
Author

Hello AstrobioMike,

You're welcome, and thank you for getting back to me so fast it is much appreciated. And I checked my tree and I am indeed missing a species.

Moreover when I run the code "GToTree -f Untitled2.txt -o hope_new -H Archaea" it says it is only using 40 out of my 41 species despite my list (Untitled2.txt) containing all of them.

I don't know if it's related but the one that is missing is the last one on my list.

@AstrobioMike
Copy link
Owner

AstrobioMike commented Apr 13, 2023

Hmm, strange. Any chance you’d be able to share the fasta files and the input Untitled.txt file with me at MikeLee<at>bmsis.org so I can take a look? I’ll delete them right after testing of course

@AstrobioMike
Copy link
Owner

@Calvin2077 and i tracked down that the issue was the input file listing the paths to the genomes didn't have a line-return character at the end of the file, and the last one was being left off

i need to think about how to put in a check for this

@AstrobioMike AstrobioMike self-assigned this Apr 13, 2023
@AstrobioMike AstrobioMike changed the title Redundancy increase threshold add check for i put genome file not having line return after last input genome Mar 9, 2024
@AstrobioMike AstrobioMike changed the title add check for i put genome file not having line return after last input genome Add check for input genome file not having line return after last input genome Mar 9, 2024
@AstrobioMike AstrobioMike changed the title Add check for input genome file not having line return after last input genome Add check for input genome file not having line return after last input genome (renamed by Mike) Mar 9, 2024
@AstrobioMike
Copy link
Owner

Note for myself

I currently runn a dos2unix/cmp check on each input file, e.g.:

dos2unix < ${NCBI_acc_file} | cmp - ${NCBI_acc_file} > /dev/null

I can add the --add-eol argument so they will auto-add an end-of-line to end of file if it's not there. That will address this. (Add it to the cmp checks too, so it's still only run if needed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants