Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep large unbinned contigs for downstream steps #29

Merged
merged 17 commits into from
Mar 10, 2020
Merged

Conversation

d4straub
Copy link
Collaborator

@d4straub d4straub commented Dec 19, 2019

This solves #27

@d4straub d4straub requested a review from HadrienG December 19, 2019 13:28
@d4straub
Copy link
Collaborator Author

Checks seem to fail because of internet connectivity? Weird.

@d4straub
Copy link
Collaborator Author

d4straub commented Dec 19, 2019

Ok, BUSCO obviously is now available in v4beta (we are using v3) and the location of the database changed. Hope I fixed this now. Default database is just 8Mb, maybe we could either upgrade to v4 or store the v3 default database in nf-core/test-datasets that we do not loose it again...

edit: seriously, the file seems to have changed as well? Wow.
edit2: nope the file is fine, this is a problem with the test itself.

@HadrienG
Copy link
Member

Do you mind if I review/merge this after 1.0.0? I'd like the first release to be done before Christmas 😄

Concerning Busco I changed the url in dev, I can make a PR to update to v4 (after 1.0.0 ! 😁 )

@apeltzer apeltzer added this to the 1.1.0 milestone Dec 21, 2019
@d4straub
Copy link
Collaborator Author

Fine for me, but I think that's a major flaw. The solution proposed here works fine with test data, but it could be that a pooled bin that is produced right now and forwarded to downstream processes is too big with real data and needs a change.

If you manage to get your release out before I optimize this step here (holiday right now, not working on it atm), go ahead ;)

@HadrienG
Copy link
Member

A pooled bin being too big seems like a separate issue? Since you are not forwarding the unbinned contigs to downstream processes?

(Happy holidays!)

@d4straub
Copy link
Collaborator Author

d4straub commented Feb 7, 2020

Still linting errors but coming closer ...

@d4straub
Copy link
Collaborator Author

No errors any more, review please.

out_base = (os.path.splitext(input_file)[0])

# Read file
fasta_sequences = SeqIO.parse(open(input_file),'fasta')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files should be opened with the with context manager. that way the file handles will close automatically at the end of the scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e.

with open(input_file) as f:
    fasta_sequences = SeqIO.parse(f, 'fasta')
    # rest of the code that needs the file to be open.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Changed in 8effb0b

main.nf Outdated
@@ -906,12 +913,12 @@ process metabat {
def name = "${assembler}-${sample}"
"""
jgi_summarize_bam_contig_depths --outputDepth depth.txt ${bam}
metabat2 -t "${task.cpus}" -i "${assembly}" -a depth.txt -o "MetaBAT2/${name}" -m ${min_size}
metabat2 -t "${task.cpus}" -i "${assembly}" -a depth.txt -o "MetaBAT2/${name}" -m ${min_size} --seed 1 --unbinned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't there rather been no seed by default but an option to fix the seed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My aim was to make sure that every time the workflow runs the same results are created. Therefore I fixed the seed here. But later I realized that at least SPAdes is not producing the identical results, obviously there is also some randomness in there and I haven't seen an option to fix the seed there. So I don't feel that it necessary at all any more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 17065a9

@d4straub d4straub requested a review from HadrienG February 17, 2020 09:59
@HadrienG HadrienG merged commit d8c55bc into nf-core:dev Mar 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants