Keep large unbinned contigs for downstream steps #29

d4straub · 2019-12-19T13:27:27Z

This solves #27

Dev

d4straub · 2019-12-19T13:39:44Z

Checks seem to fail because of internet connectivity? Weird.

d4straub · 2019-12-19T14:08:30Z

Ok, BUSCO obviously is now available in v4beta (we are using v3) and the location of the database changed. Hope I fixed this now. Default database is just 8Mb, maybe we could either upgrade to v4 or store the v3 default database in nf-core/test-datasets that we do not loose it again...

edit: seriously, the file seems to have changed as well? Wow.
edit2: nope the file is fine, this is a problem with the test itself.

HadrienG · 2019-12-21T07:25:42Z

Do you mind if I review/merge this after 1.0.0? I'd like the first release to be done before Christmas 😄

Concerning Busco I changed the url in dev, I can make a PR to update to v4 (after 1.0.0 ! 😁 )

d4straub · 2019-12-21T13:41:13Z

Fine for me, but I think that's a major flaw. The solution proposed here works fine with test data, but it could be that a pooled bin that is produced right now and forwarded to downstream processes is too big with real data and needs a change.

If you manage to get your release out before I optimize this step here (holiday right now, not working on it atm), go ahead ;)

HadrienG · 2019-12-22T08:24:54Z

A pooled bin being too big seems like a separate issue? Since you are not forwarding the unbinned contigs to downstream processes?

(Happy holidays!)

D4straub v0.2

d4straub · 2020-02-07T14:25:15Z

Still linting errors but coming closer ...

d4straub · 2020-02-10T16:04:50Z

No errors any more, review please.

HadrienG · 2020-02-17T06:10:57Z

bin/split_fasta.py

+out_base = (os.path.splitext(input_file)[0])
+
+# Read file
+fasta_sequences = SeqIO.parse(open(input_file),'fasta')


files should be opened with the with context manager. that way the file handles will close automatically at the end of the scope.

i.e.

with open(input_file) as f: fasta_sequences = SeqIO.parse(f, 'fasta') # rest of the code that needs the file to be open.

Right! Changed in 8effb0b

HadrienG · 2020-02-17T06:16:05Z

main.nf

@@ -906,12 +913,12 @@ process metabat {
    def name = "${assembler}-${sample}"
    """
    jgi_summarize_bam_contig_depths --outputDepth depth.txt ${bam}
-    metabat2 -t "${task.cpus}" -i "${assembly}" -a depth.txt -o "MetaBAT2/${name}" -m ${min_size}
+    metabat2 -t "${task.cpus}" -i "${assembly}" -a depth.txt -o "MetaBAT2/${name}" -m ${min_size} --seed 1 --unbinned


shouldn't there rather been no seed by default but an option to fix the seed?

My aim was to make sure that every time the workflow runs the same results are created. Therefore I fixed the seed here. But later I realized that at least SPAdes is not producing the identical results, obviously there is also some randomness in there and I haven't seen an option to fix the seed there. So I don't feel that it necessary at all any more.

Changed in 17065a9

d4straub and others added 3 commits December 18, 2019 15:44

Merge pull request #2 from nf-core/dev

c6829ef

Dev

forward large non-binned contigs for downstream analysis

36c0fca

update docs

8d0d0d7

d4straub requested a review from HadrienG December 19, 2019 13:28

remove typo

0834fbe

update busco database link

36648fa

apeltzer added this to the 1.1.0 milestone Dec 21, 2019

d4straub and others added 8 commits December 28, 2019 21:28

Update base.config - more bowtie2 resources and retries

a8f89a7

Update base.config - more retires for filtlong

de40894

Merge branch 'dev' into dev

2709a12

Merge pull request #3 from d4straub/d4straub-v0.2

0cdeaa5

D4straub v0.2

singleEnd to single_end

8c9c368

igenomesIgnore to igenomes_ignore

1c8b9d7

Update changelog

11d09f7

Fix travis

c9110fa

d4straub added 2 commits February 10, 2020 10:40

fix Dockerfile

3f37534

fix CHANGELOG

a143523

HadrienG requested changes Feb 17, 2020

View reviewed changes

d4straub added 2 commits February 17, 2020 09:42

remove matabat2 seed

17065a9

fix bin/split_fasta.py

8effb0b

d4straub requested a review from HadrienG February 17, 2020 09:59

HadrienG approved these changes Mar 10, 2020

View reviewed changes

HadrienG merged commit d8c55bc into nf-core:dev Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep large unbinned contigs for downstream steps #29

Keep large unbinned contigs for downstream steps #29

d4straub commented Dec 19, 2019 •

edited

Loading

d4straub commented Dec 19, 2019

d4straub commented Dec 19, 2019 •

edited

Loading

HadrienG commented Dec 21, 2019

d4straub commented Dec 21, 2019

HadrienG commented Dec 22, 2019

d4straub commented Feb 7, 2020

d4straub commented Feb 10, 2020

HadrienG Feb 17, 2020

HadrienG Feb 17, 2020

d4straub Feb 17, 2020

HadrienG Feb 17, 2020

d4straub Feb 17, 2020

d4straub Feb 17, 2020

Keep large unbinned contigs for downstream steps #29

Keep large unbinned contigs for downstream steps #29

Conversation

d4straub commented Dec 19, 2019 • edited Loading

d4straub commented Dec 19, 2019

d4straub commented Dec 19, 2019 • edited Loading

HadrienG commented Dec 21, 2019

d4straub commented Dec 21, 2019

HadrienG commented Dec 22, 2019

d4straub commented Feb 7, 2020

d4straub commented Feb 10, 2020

HadrienG Feb 17, 2020

Choose a reason for hiding this comment

HadrienG Feb 17, 2020

Choose a reason for hiding this comment

d4straub Feb 17, 2020

Choose a reason for hiding this comment

HadrienG Feb 17, 2020

Choose a reason for hiding this comment

d4straub Feb 17, 2020

Choose a reason for hiding this comment

d4straub Feb 17, 2020

Choose a reason for hiding this comment

d4straub commented Dec 19, 2019 •

edited

Loading

d4straub commented Dec 19, 2019 •

edited

Loading