Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace how one mapping was derived from a( set of o)nther? #91

Open
matentzn opened this issue Sep 24, 2021 · 4 comments
Open

Trace how one mapping was derived from a( set of o)nther? #91

matentzn opened this issue Sep 24, 2021 · 4 comments

Comments

@matentzn
Copy link
Collaborator

When

  • creating reference mappings based on multiple sets of candidate mappings or
  • creating new mappings based on walks of other mappings

we currently have no way to capture this, because we also have no way to identify a mapping at the moment (unique mapping id). Not sure how to deal with that in a practical way.

@cthoyt
Copy link
Member

cthoyt commented Sep 24, 2021

We could define a hash function over the subject/relation/target that first canonicalizes the curies (using the bioregistry, for example) then use that as an identifier for a given mapping. Or we could string concatenation them together with another delimiter we didn’t expect to see in the identifiers, like a pipe

if we can be creative, we could create a data model for describing a sequence of transformations applied to a given mapping or set of mappings (I don’t think this would fit inside sssom itself, though)

@matentzn
Copy link
Collaborator Author

This is not enough for the walking provenance problem, and there is a problem that an SSSOM file can contain the same mapping twice..

@graybeal
Copy link

graybeal commented Oct 8, 2021

I suggested elsewhere a few minutes ago the same concatenation suggestion as @cthoyt (including the artifact_id as the beginning of the identifier). If an SSSOM file contains the same mapping twice, if they are identical I don't think we care at a practical level about which one is 'identified'; if the difference is some additional metadata about that mapping, just make the concatenation cover the entire content. With column headers too (after column 3) if one wanted to be, umm, exhaustive.

What I like is that it's human-traceable and even human-comprehensible (depending on necessary escapes), which makes up for the exhaustingly long identifiers. And no extra work needed by the author, so the SSSOM stays Simple.

I don't understand what's not enough for the walking provenance problem, can we be more explicit about what's needed and missing? (Unless it's obvious)

@matentzn
Copy link
Collaborator Author

matentzn commented Oct 8, 2021

if the mappingset id is part of that ID, you are right.. We can encode the whole walk like that, and it will look horrible, but will actually be quite readable once you recover from the initial shock. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants