Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negated mappings and the standardisation of mapping predicate modifiers #40

Closed
cmungall opened this issue Oct 7, 2020 · 35 comments · Fixed by #99
Closed

Negated mappings and the standardisation of mapping predicate modifiers #40

cmungall opened this issue Oct 7, 2020 · 35 comments · Fixed by #99

Comments

@cmungall
Copy link
Contributor

cmungall commented Oct 7, 2020

This issue is a history of the discussion on how to handle negated mappings. After a lot of discussion and a final vote at #40 (comment), we've decided to go with adding an additional predicate modifier column to the SSSOM standard. This issue can be closed along with a pull request that realizes this update.

See draft solution in #99

Original issue text from @cmungall:

Similar to #38 we could allow predicates to be property expressions such as !owl:equivalentTo

@matentzn
Copy link
Collaborator

Think about "we have not looked at it yet" vs "we looked and definitely no".

@mellybelly
Copy link

City of Colón vs. human colon seems like the perfect example.

@matentzn
Copy link
Collaborator

matentzn commented Jun 6, 2021

For some applications, I need to be able to subtract one record from another. For example, consider

subject_id relation_id object_id match_type
HP:001 owl:equivalentTo MP:001 sssom:AutomatedMapping

being produced by an automated approach. A Human curator finds that is wrong:

subject_id relation_id object_id match_type
HP:001 !owl:equivalentTo MP:001 sssom:HumanCurated

So when I reconcile these two records automatically, I need to make sure I can effectively remove the second mapping from the mapping set that contains the first.

@AlasdairGray
Copy link

This is partly what we were trying to do with Scientific Lenses, but we did it at the granularity of linksets. I think what you are saying here is that you want to do this at the granularity of a single mapping. That is, if there are a large set of automated mappings, you don't want to eliminate the whole set, only those that have been flagged as problematic by the human curator.

@matentzn
Copy link
Collaborator

matentzn commented Jun 7, 2021

Thank you @AlasdairGray for weighing in, that is exactly right!

@matentzn
Copy link
Collaborator

matentzn commented Jun 7, 2021

From meeting, alt suggestion is to use predicate_modifier with NOT, INVERSE, DIRECT

@matentzn
Copy link
Collaborator

matentzn commented Jun 7, 2021

Boomer needs to distinguish proper subclass from general.

@matentzn
Copy link
Collaborator

matentzn commented Jun 21, 2021

Vote: Capturing predicate modifiers

Prefixes solution: 🚀

subject_id relation_id object_id match_type
HP:001 !owl:subClassOf MP:001 HumanCurated
HP:001 ^owl:subClassOf MP:001 HumanCurated
HP:001 +owl:subClassOf MP:001 HumanCurated
HP:001 owl:subClassOf MP:001 HumanCurated

Separate modifier: 👍

subject_id relation_id object_id match_type predicate_modifier
HP:001 owl:subClassOf MP:001 HumanCurated NOT
HP:001 owl:subClassOf MP:001 HumanCurated INVERSE
HP:001 owl:subClassOf MP:001 HumanCurated INDIRECT
HP:001 owl:subClassOf MP:001 sssom:HumanCurated DIRECT

EDIT: Hybrid 👀

There are some issues with both suggestions above, even apart from what @cthoyt lays out below:

  1. The prefix solution has the problem that the direct/indirect distinction is semantically meaningless and confusing, but we need to satisfy many of our use cases
  2. The Modifier solution is risky as it introduces a feature into the standard that completely changes the interpretation of one other key columns (predicate_id), which can severely break pipelines that are unaware of the change. Therefore, we now suggest this hybrid:
subject_id relation_id object_id match_type predicate_modifier
HP:001 !owl:subClassOf MP:001 HumanCurated
HP:001 ^owl:subClassOf MP:001 HumanCurated
HP:001 owl:subClassOf MP:001 HumanCurated INDIRECT
HP:001 owl:subClassOf MP:001 sssom:HumanCurated DIRECT

@cthoyt
Copy link
Member

cthoyt commented Jun 21, 2021

I'm strongly in the separate modifier camp, since adding prefixes means that the relation_id column can no longer be directly considered as a CURIE. This would probably make SSSOM-compliant data more inconvenient to handle without using first party tools.

Mappings naturally don't require predicate modifier, so many datasets won't need to even use this column. Additionally, GO annotations also have a similar extra columns for predicate modifiers which I think make it much easier to use than having to parse the predicates. Like in GO, there is probably a need to define a vocabulary for what the modifiers are (using CURIEs, again!) such that users can understand what exactly is meant by "NOT", "INVERSE", etc.

I think that either solution could support @matentzn's concern about subtracting records c.f. #40 (comment)

Ben also mentioned to me a valid counterpoint that any naïve string matching on SSSOM-compliant data for something like == 'skos:exactMatch' could backfire if the user is not aware of an extra predicate modifier column. I think this is part of a more general concern I have for SSSOM in that it uses single columns to describe CURIEs rather than one column for the prefix and a second for the identifier. CURIEs are already difficult to parse and standardize, as we've unfortunately learned, and adding even more complexity will likely serve to exacerbate it.

A potential (though I admit more convoluted) third option: create more CURIEs that represent negated relations (as suggested in the discussion of the Datum Ontology and shadow classes c.f. OBOFoundry/OBOFoundry.github.io#1539)

@bgyori
Copy link

bgyori commented Jun 21, 2021

I wanted to second the comment that this is particularly useful when curating automatically inferred mappings to assert that a given mapping is incorrect. I was originally thinking 🚀 but @cthoyt's comment makes a lot of sense and so I voted for 👍 .

@cthoyt
Copy link
Member

cthoyt commented Jun 21, 2021

Regardless, here's a CC0 manually curated set of negative mappings that @bgyori and I first automatically generated with various lexical matching techniques, then decided weren't correct. Mapping type "manual" means we did it unprompted, "manually_reviewed" if it was first automatically generated then we curated it. https://github.com/biomappings/biomappings/blob/master/src/biomappings/resources/incorrect.tsv

@matentzn
Copy link
Collaborator

Thank you both for the comments; there were other problems with the suggestion above, so I introduces a third option which we favour over the pure prefix one..

I am not too concerned about the prefix mapping issue as sssom-py handles that - however, I still get your points 100%. To be honest, I just had a chat with @cmungall and he said what I also think: that we are barely (65%) towards the Prefix, now hybrid, solution. We can still be convinced otherwise. So its basically know about weighing the churn of having the interpret the first char of the predicate column as either a modifier or the first character of the actual prefix vs the easier readability and less error proneness for people reading the file trying to interpret the columns. I still feel that the risk of people ignoring the predicate_modifier column is too high for my taste. But lets keep raging.

@cmungall
Copy link
Contributor Author

cmungall commented Jun 21, 2021 via email

@cthoyt
Copy link
Member

cthoyt commented Jun 21, 2021

Could it be the case that we're trying to solve too many problems with a single column (and perhaps making a single vote)? Being positive/negative and being direct/indirect seem like problems that might be better to solve at separate times, rather than loading up a single blanket "modifiers" column. Maybe a solution where there is one column with a boolean value for each might make it more simple to address.

Disclosure: I'm not yet so motivated by the direct/indirect issue - I haven't been in a situation where I wanted to capture that and I'm missing the context for why it came up in discussion here. I'd be keen to learn more

I noticed that the sssom vocabulary introduces the superClassOf, which is an obvious inverse to the standard rdfs:subClassOf relationship. Is there a reason why this has been explicitly left out of rdfs? I saw other discussions where @cmungall had proposed other subproperties that are more descriptive of whether something is a "proper" subclass/superclass, but I didn't really get why that didn't make it to primetime. Maybe it's hard to keep all of these things "consistent"

@matentzn
Copy link
Collaborator

Vote: Capturing negation and inverse

Prefixes solution: 🚀

subject_id relation_id object_id match_type
HP:001 !owl:subClassOf MP:001 HumanCurated
HP:001 ^owl:subClassOf MP:001 HumanCurated

Separate modifier: 👍

subject_id relation_id object_id match_type predicate_modifier
HP:001 owl:subClassOf MP:001 HumanCurated NOT
HP:001 owl:subClassOf MP:001 HumanCurated INVERSE

Negated relations 👀

subject_id relation_id object_id match_type
HP:001 sssom:notSubClassOf MP:001 HumanCurated
HP:001 sssom:superClassOf MP:001 HumanCurated

Arguments

Separate modifier Prefix solution Negated relation solution
Conceptual Is "cleaner", i.e. predicate_id column can be interpreted as CURIE Could create a stronger dependency on specialised tooling (sssom-py). Introduces new non-standard vocabulary.
Idempotent Is not idempotent (tooling that is not migrated to consider new feature could produce faulty results.) Is idempotent. Is idempotent
Usability Requires exploration of context. Does not require exploration of context. Does not require exploration of context
  • Note @cthoyt arguments that negative and inverse mappings occur rarely speaks IMO to both solutions - if they are so rare, then no specialised tooling is required either way.

I think from the discussion here it is pretty clear that we all agree that for the case of DIRECT, INDIRECT, REFLEXIVE we just use a bespoke predicate_modifier. Correct me if I am wrong by adding a 👎 along with your vote regarding the above.

@matentzn matentzn changed the title negated mappings Negated mappings and the standardisation of mapping predicate modifiers Aug 20, 2021
@matentzn
Copy link
Collaborator

Another problem with any approach here is if you would want a double modification, like negative inverse.

@matentzn
Copy link
Collaborator

Look here for more discussions, where they went with what @cthoyt is suggesting: biolink/biolink-model#826

@cmungall
Copy link
Contributor Author

cmungall commented Sep 2, 2021

I hate to force another vote but maybe we need a different kind of hybrid.

I think for inverted relations, it is cleanest to add predicates. The most unsatisfying thing here is that there is no community standard URI for subClassOf (equivalentTo is symmetric, and most other things are ObjectProperties in a vocabulary like RO). But we could easily add has_subclass to biolink.

My original ideal was for an expressive rdf path like syntax for arbitrary paths, but that seems overkill.

So I vote against ^

I think the real sticking point is negation, with really strong arguments for the three different ways of handling this

At the meeting we can discuss people's use cases for different combinations, here are mine:

  • not related to:
    • very useful.
    • note the semantics here should be to exclude exact, equivalent, close, related, broad, narrow
    • I frequently want to provide information that there is no meaningful connection between two terms. This is often important information where there may be naive assumptions the two terms are related, e.g. based on their names
  • not equivalent, not exact match: somewhat useful
  • not subclass, not superclass: rarely useful

@cthoyt
Copy link
Member

cthoyt commented Sep 3, 2021

Further discussion was had in the SSSOM workshop at https://docs.google.com/document/d/1xUNUCXE-iAWJWgZwXdjq58hRAlRhMtaKHH6CeDWJWSw/edit?usp=sharing

@matentzn
Copy link
Collaborator

@cthoyt we should make a call on this before finalising the paper.

@cthoyt
Copy link
Member

cthoyt commented Oct 13, 2021

At the end of the discussion, I think we were all pretty much in agreement that we did not like the addition of new syntax, but would be happy with either of the two following solutions:

  1. Introduce new relations that contain semantics about stuff being not true
  2. Introduce a predicate modifier column for negations

At this point I think I'm leaning towards 2 the biolink-model group already chose this one at the end of discussion of biolink/biolink-model#826. Should we put it to a final vote?

@matentzn
Copy link
Collaborator

Vote: Should we allow a modifier column that will change the semantics of the mapping?

Arguments for an against in the ticket above.

👍 Yes, lets introduce the modifier column that allows us to say "NOT" and similar to modify the mapping relation
👎 No, lets stick with relations. If the relation like notRelatedTo does not exist, we have to define it somewhere

@matentzn
Copy link
Collaborator

I will vote a bit later, because I dont want to bias the vote with my own annoying position too much.

@cthoyt
Copy link
Member

cthoyt commented Oct 13, 2021

@matentzn when will the vote close?

if we mint new relations, I'd guess they initially would live in the sssom idspace

@matentzn
Copy link
Collaborator

Friday 22 October :)

@graybeal
Copy link

I'm late to the discussion and don't have time to do the deep dive on all the references. I will just say, I was really surprised that everyone is talking about and favoring (what seem to be) semantics-incompatible solutions to expressing a semantic relationship.

@matentzn
Copy link
Collaborator

  • The semantics part can be sorted out on the side of the model - this discussion is just about syntax! It's true though that there are some things that cannot be translated into RDF in this thread, like "indirect" or "not" in conjunction with an annotation property (at least it would be semantics free in the sense that no reasoner could detect it). But yes, how we deal with the mapping into RDF in a meaningful way is a concern, but it can be done.

@graybeal
Copy link

Right, I get that, but it's opaque to a non-expert. Well-named semantic relations are transparent, and using them in this context makes the SSMOC transparent.

@matentzn
Copy link
Collaborator

I also still have not cast my vote.. At the moment I am 51% - 49% on the 👎 side.. Still doing some soul searching

@cthoyt
Copy link
Member

cthoyt commented Oct 26, 2021

Thanks everybody for voting and engaging on this. I think it was a really good exercise having all of the discussions leading to the final vote, and while no solution is perfect, I think that this one will support a large variety of what people want to do. I'm going to edit the original issue text to reflect that we've decided on using a modifier column to the SSSOM standard, then this issue can get closed when that implementation in the model itself is realized.

@matentzn
Copy link
Collaborator

Thank you for organising and making PR @cthoyt :)

@wdduncan
Copy link

I was not unaware of this vote. But, I cast a 👎 for reasons that @graybeal cited.

@matentzn
Copy link
Collaborator

Too late guys, this is merged in in sssom now :P for better or worse!

@wdduncan
Copy link

@matentzn that's fine, I suppose :)
However, I am unsure what the final implementation looks like. Is it another column in the spreadsheet or are the modifiers placed in same field as the predicate?

Sorry for the confusion ...

@matentzn
Copy link
Collaborator

separate column! https://mapping-commons.github.io/sssom/Mapping/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants