Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check in larger test data, and decide how to host it #23

Closed
droazen opened this issue Aug 31, 2018 · 3 comments
Closed

Check in larger test data, and decide how to host it #23

droazen opened this issue Aug 31, 2018 · 3 comments
Milestone

Comments

@droazen
Copy link

droazen commented Aug 31, 2018

Disq should run tests continuously on larger test data, as there are undoubtedly many code paths that don't get exercised on tiny data. Bams/crams that are a couple hundred MB in size would provide a good compromise between test suite speed and the need to test on more realistic data, I think. There are some suitable bams checked into the gatk repo under "large": https://github.com/broadinstitute/gatk/tree/master/src/test/resources/large

As part of this ticket, we'll have to decide how to host and version this large test data. The standard solution is git lfs (which is what we use in the gatk), but if there are other good alternatives out there we should evaluate those as well.

@heuermh
Copy link
Contributor

heuermh commented Sep 5, 2018

+1 to maintaining a reference set of test data in git lfs, possibly in a separate repo under the disq-bio organization.

Then it would also be worthwhile to mirror these data in cloud storage at all the major providers.

I am willing to produce and help host these data after transformation into Parquet+Avro for comparison and later testing of conversion if/when that functionality migrates into Disq.

@tomwhite
Copy link
Member

See https://github.com/disq-bio/disq#real-world-file-testing for the tests we can run.

@tomwhite
Copy link
Member

tomwhite commented Sep 9, 2019

This was addressed by #103

@tomwhite tomwhite closed this as completed Sep 9, 2019
@heuermh heuermh added this to the 0.4.0 milestone Sep 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants