-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR: sequence digest algorithm to be GA4GH digest #31
Changes from 2 commits
b8f54de
139a69f
98e9536
c6708e1
0c6781f
02798c4
4c1de0d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,24 @@ | |
|
||
[TOC] | ||
|
||
## 2022-05-11 - Sequence identifier specification | ||
|
||
### Decision | ||
|
||
The GA4GH identifier will be used as our default sequence identifier instead of MD5. Other identifiers can be provided in a separate array and should not be part of the collection checksum calculation. | ||
|
||
### Rationale | ||
|
||
GA4GH identifiers were created as part of the [Variation Representation Specification standard](https://vrs.ga4gh.org/en/stable/impl-guide/computed_identifiers.html), which included a way of creating identifiers to be used with sequences e.g. ACGT results in the identifier `ga4gh:SQ.aKF498dAxcJAqme6QYQ7EZ07-fiw8Kw2`. The scheme uses the [`sha512t24u` function](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0239883) to create a base64 URL-safe representation of a sha512 digest. Adopting GA4GH identifiers ensures sequence collections remains inline with newer standards within the GA4GH ecosystem. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should talk about the prefix |
||
|
||
### Limitations | ||
|
||
GA4GH identifiers are not the default identifier used by standards such as CRAM, which uses MD5. We expect sequence collection providers to offer additional identifier arrays to provide compatability with these other formats and to declare their sequence identifier support via service-info. | ||
|
||
### Linked issues | ||
|
||
- [https://github.com/ga4gh/seqcol-spec/issues/30](https://github.com/ga4gh/seqcol-spec/issues/30) | ||
|
||
## 2021-12-01 - Endpoint names and structure | ||
|
||
### Decision | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should specify where we intend to use the identifiers. Something along the line of