Check in larger test data, and decide how to host it #23

droazen · 2018-08-31T15:26:55Z

Disq should run tests continuously on larger test data, as there are undoubtedly many code paths that don't get exercised on tiny data. Bams/crams that are a couple hundred MB in size would provide a good compromise between test suite speed and the need to test on more realistic data, I think. There are some suitable bams checked into the gatk repo under "large": https://github.com/broadinstitute/gatk/tree/master/src/test/resources/large

As part of this ticket, we'll have to decide how to host and version this large test data. The standard solution is git lfs (which is what we use in the gatk), but if there are other good alternatives out there we should evaluate those as well.

The text was updated successfully, but these errors were encountered:

heuermh · 2018-09-05T14:59:02Z

+1 to maintaining a reference set of test data in git lfs, possibly in a separate repo under the disq-bio organization.

Then it would also be worthwhile to mirror these data in cloud storage at all the major providers.

I am willing to produce and help host these data after transformation into Parquet+Avro for comparison and later testing of conversion if/when that functionality migrates into Disq.

tomwhite · 2019-02-19T11:35:11Z

See https://github.com/disq-bio/disq#real-world-file-testing for the tests we can run.

tomwhite · 2019-09-09T09:46:26Z

This was addressed by #103

droazen mentioned this issue Aug 31, 2018

[DISQ-10] Initial Disq code contribution. #14

Merged

tomwhite mentioned this issue Jun 3, 2019

Test large files on Travis #103

Merged

tomwhite closed this as completed Sep 9, 2019

heuermh added this to the 0.4.0 milestone Sep 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check in larger test data, and decide how to host it #23

Check in larger test data, and decide how to host it #23

droazen commented Aug 31, 2018

heuermh commented Sep 5, 2018

tomwhite commented Feb 19, 2019

tomwhite commented Sep 9, 2019

Check in larger test data, and decide how to host it #23

Check in larger test data, and decide how to host it #23

Comments

droazen commented Aug 31, 2018

heuermh commented Sep 5, 2018

tomwhite commented Feb 19, 2019

tomwhite commented Sep 9, 2019