-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace how one mapping was derived from a( set of o)nther? #91
Comments
We could define a hash function over the subject/relation/target that first canonicalizes the curies (using the bioregistry, for example) then use that as an identifier for a given mapping. Or we could string concatenation them together with another delimiter we didn’t expect to see in the identifiers, like a pipe if we can be creative, we could create a data model for describing a sequence of transformations applied to a given mapping or set of mappings (I don’t think this would fit inside sssom itself, though) |
This is not enough for the walking provenance problem, and there is a problem that an SSSOM file can contain the same mapping twice.. |
I suggested elsewhere a few minutes ago the same concatenation suggestion as @cthoyt (including the artifact_id as the beginning of the identifier). If an SSSOM file contains the same mapping twice, if they are identical I don't think we care at a practical level about which one is 'identified'; if the difference is some additional metadata about that mapping, just make the concatenation cover the entire content. With column headers too (after column 3) if one wanted to be, umm, exhaustive. What I like is that it's human-traceable and even human-comprehensible (depending on necessary escapes), which makes up for the exhaustingly long identifiers. And no extra work needed by the author, so the SSSOM stays Simple. I don't understand what's not enough for the walking provenance problem, can we be more explicit about what's needed and missing? (Unless it's obvious) |
if the mappingset id is part of that ID, you are right.. We can encode the whole walk like that, and it will look horrible, but will actually be quite readable once you recover from the initial shock. :) |
When
we currently have no way to capture this, because we also have no way to identify a mapping at the moment (unique mapping id). Not sure how to deal with that in a practical way.
The text was updated successfully, but these errors were encountered: