-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Will the API offer an alias to digest conversion endpoint? #4
Comments
The big issue for me is what we mean by
In all likeliness all of the above is the right answer, since GRC patches are additive it means to refer to p13 is to refer to all prior, but since patches are frequently released you want to use the "tag". Also it means GRCh38 applies equally to all of these. So there is an imprecise query coming in "I want the assembly that refers to hg38" which we cannot give an exact answer to because seqcol is going to be very precise about what you're going to work with. |
This is the reverse lookup use case and similar to the discussion with refget reverse lookup workstream so I guess I can add my current thinking here:
We could specify in the |
I think this is the right way to think about the issue so we can combine our thinking for sequence reverse lookup and this. Having this be an implementation specific issue is a good way around the problem, but I do think any service that's worth its salt will register all known aliases. The bigger problem now will be how to handle the ambiguity and pass back the "correct" and precise collection or sequence from an imprecise query. I don't think that's this API's business but something that'll have to be an out of scope manual curation process. Though I can see someone from a genome provider like UCSC, Ensembl or INSDC making those calls. |
Well, if it doesn't want to make an authoritative claim on what a human readable alias means it would pass back all the possible matches. If or if it does want to make an authoritative claim, it would pass back just the one it claims is the match. |
One of the use cases brought up was this. What if a user wants to get the sequence collection checksum(s) from either the name of the collections (e.g. grch38).
We determined that Sequence collections should be congruent with the approach taken by refget in terms of allowing human-readable alias-based queries.
In this issue: samtools/hts-specs/issues/329 it seems clear that refget was not intended to do this.
@andrewyatz says:
In light of this, I'd propose the seqcol spec specifically not provide endpoints that operate on human-readable aliases.
On the other hand, 'chr1' is a much more universal identifier than something like 'hg38', so perhaps there is some value in returning a list of identifiers that include "hg38" under "aliases".
The text was updated successfully, but these errors were encountered: