-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shot at bringing draft spec up to date with adrs #44
Conversation
I will merge this to dev to make it easier to see and review this. |
The GA4GH digest algorithm, `sha512t24u`, was created as part of the [Variation Representation Specification standard](https://vrs.ga4gh.org/en/stable/impl-guide/computed_identifiers.html). This procedure is described as ([Hart _et al_. 2020](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0239883)): | ||
|
||
- performing a SHA-512 digest on a binary blob of data | ||
- truncate the resulting digest to 24 bytes | ||
- encodes the 24 bytes using `base64url` ([RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648#section-5)) resulting in a 32 character string | ||
|
||
This converts the value of each attribute in the seqcol into a digest string. Applying this to each value will produce a structure that looks like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should get its own section so it can be refered to elsewhere like in the sorted_name_length_pairs
section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. would it make sense in a footnote, maybe?
Far from final, but this at least brings a draft specification up-to-date with our current decisions. Take a look if you can but the goal here is to provide a general, one-stop description of seqcol that could be useful to have during Connect.
I do still need to get the 'inherent' stuff in there, though.
TODO: