Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Train module issue #6

Open
llk578496 opened this issue Dec 7, 2022 · 1 comment
Open

[Question] Train module issue #6

llk578496 opened this issue Dec 7, 2022 · 1 comment
Labels
question Further information is requested

Comments

@llk578496
Copy link

llk578496 commented Dec 7, 2022

Hello @endixk ,

Thank you for developing this amazing pipeline! Our team are currently working on the clinical outbreak investigation on one of the most challenging multidrug-resistant fungi - Candida auris.

We would like to build a specific marker gene set for Candida auris by using the train module base on all the Candida auris genomes available on NCBI with complete/chromosome assembly level.

We have already downloaded a total of 45 genomes and created a directory containing all these reference genomes. However, when we tried to use the train module, we found that there was one more required option: -i STR Directory containing marker sequences in FASTA format (should be able to build an MSA).

May we know what data should we provide for this option?

Thank you very much!

Best regards,
Eddie

@endixk
Copy link
Member

endixk commented Dec 8, 2022

Dear Eddie,

Thank you for using our pipeline!

Based on your description of your aim, it sounds like you are trying to identify marker genes for Candida auris de novo from your genome sequences. Unfortunately, this is currently beyond the capabilities of our pipeline.

The train module of our pipeline is designed to generate profile HMMs from a pre-defined set of marker genes, using an iterative training process with the given set of genome sequences to improve sensitivity. This means that in order to use the module, you will need to first identify a set of candidate marker genes for Candida auris.

One potential resource for identifying marker genes for this organism could be OrthoDB Saccharomycetes subset. Once you have identified a set of candidate marker genes, you can create a FASTA file for each marker by gathering a handful of protein sequences. Then, you can provide a directory containing all of these FASTA files as the input for the -i option, which will generate profile HMMs for the marker genes you provided.

If this explanation is unclear or if you have any further questions, please do not hesitate to ask.

Thanks!

Best wishes,
Daniel

@endixk endixk added the question Further information is requested label Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants