Negated mappings and the standardisation of mapping predicate modifiers #40

cmungall · 2020-10-07T00:29:50Z

This issue is a history of the discussion on how to handle negated mappings. After a lot of discussion and a final vote at #40 (comment), we've decided to go with adding an additional predicate modifier column to the SSSOM standard. This issue can be closed along with a pull request that realizes this update.

See draft solution in #99

Original issue text from @cmungall:

Similar to #38 we could allow predicates to be property expressions such as !owl:equivalentTo

The text was updated successfully, but these errors were encountered:

matentzn · 2020-10-13T21:38:55Z

Think about "we have not looked at it yet" vs "we looked and definitely no".

mellybelly · 2021-01-05T23:28:38Z

City of Colón vs. human colon seems like the perfect example.

matentzn · 2021-06-06T20:57:25Z

For some applications, I need to be able to subtract one record from another. For example, consider

subject_id	relation_id	object_id	match_type
HP:001	owl:equivalentTo	MP:001	sssom:AutomatedMapping

being produced by an automated approach. A Human curator finds that is wrong:

subject_id	relation_id	object_id	match_type
HP:001	!owl:equivalentTo	MP:001	sssom:HumanCurated

So when I reconcile these two records automatically, I need to make sure I can effectively remove the second mapping from the mapping set that contains the first.

AlasdairGray · 2021-06-07T08:16:10Z

This is partly what we were trying to do with Scientific Lenses, but we did it at the granularity of linksets. I think what you are saying here is that you want to do this at the granularity of a single mapping. That is, if there are a large set of automated mappings, you don't want to eliminate the whole set, only those that have been flagged as problematic by the human curator.

matentzn · 2021-06-07T08:29:34Z

Thank you @AlasdairGray for weighing in, that is exactly right!

matentzn · 2021-06-07T17:10:47Z

From meeting, alt suggestion is to use predicate_modifier with NOT, INVERSE, DIRECT

matentzn · 2021-06-07T17:12:32Z

Boomer needs to distinguish proper subclass from general.

matentzn · 2021-06-21T14:41:48Z

Vote: Capturing predicate modifiers

Prefixes solution: 🚀

subject_id	relation_id	object_id	match_type
HP:001	!owl:subClassOf	MP:001	HumanCurated
HP:001	^owl:subClassOf	MP:001	HumanCurated
HP:001	+owl:subClassOf	MP:001	HumanCurated
HP:001	owl:subClassOf	MP:001	HumanCurated

Separate modifier: 👍

subject_id	relation_id	object_id	match_type	predicate_modifier
HP:001	owl:subClassOf	MP:001	HumanCurated	NOT
HP:001	owl:subClassOf	MP:001	HumanCurated	INVERSE
HP:001	owl:subClassOf	MP:001	HumanCurated	INDIRECT
HP:001	owl:subClassOf	MP:001	sssom:HumanCurated	DIRECT

EDIT: Hybrid 👀

There are some issues with both suggestions above, even apart from what @cthoyt lays out below:

The prefix solution has the problem that the direct/indirect distinction is semantically meaningless and confusing, but we need to satisfy many of our use cases
The Modifier solution is risky as it introduces a feature into the standard that completely changes the interpretation of one other key columns (predicate_id), which can severely break pipelines that are unaware of the change. Therefore, we now suggest this hybrid:

subject_id	relation_id	object_id	match_type	predicate_modifier
HP:001	!owl:subClassOf	MP:001	HumanCurated
HP:001	^owl:subClassOf	MP:001	HumanCurated
HP:001	owl:subClassOf	MP:001	HumanCurated	INDIRECT
HP:001	owl:subClassOf	MP:001	sssom:HumanCurated	DIRECT

cthoyt · 2021-06-21T17:09:16Z

I'm strongly in the separate modifier camp, since adding prefixes means that the relation_id column can no longer be directly considered as a CURIE. This would probably make SSSOM-compliant data more inconvenient to handle without using first party tools.

Mappings naturally don't require predicate modifier, so many datasets won't need to even use this column. Additionally, GO annotations also have a similar extra columns for predicate modifiers which I think make it much easier to use than having to parse the predicates. Like in GO, there is probably a need to define a vocabulary for what the modifiers are (using CURIEs, again!) such that users can understand what exactly is meant by "NOT", "INVERSE", etc.

I think that either solution could support @matentzn's concern about subtracting records c.f. #40 (comment)

Ben also mentioned to me a valid counterpoint that any naïve string matching on SSSOM-compliant data for something like == 'skos:exactMatch' could backfire if the user is not aware of an extra predicate modifier column. I think this is part of a more general concern I have for SSSOM in that it uses single columns to describe CURIEs rather than one column for the prefix and a second for the identifier. CURIEs are already difficult to parse and standardize, as we've unfortunately learned, and adding even more complexity will likely serve to exacerbate it.

A potential (though I admit more convoluted) third option: create more CURIEs that represent negated relations (as suggested in the discussion of the Datum Ontology and shadow classes c.f. OBOFoundry/OBOFoundry.github.io#1539)

bgyori · 2021-06-21T17:13:10Z

I wanted to second the comment that this is particularly useful when curating automatically inferred mappings to assert that a given mapping is incorrect. I was originally thinking 🚀 but @cthoyt's comment makes a lot of sense and so I voted for 👍 .

cthoyt · 2021-06-21T17:13:48Z

Regardless, here's a CC0 manually curated set of negative mappings that @bgyori and I first automatically generated with various lexical matching techniques, then decided weren't correct. Mapping type "manual" means we did it unprompted, "manually_reviewed" if it was first automatically generated then we curated it. https://github.com/biomappings/biomappings/blob/master/src/biomappings/resources/incorrect.tsv

matentzn · 2021-06-21T17:43:15Z

Thank you both for the comments; there were other problems with the suggestion above, so I introduces a third option which we favour over the pure prefix one..

I am not too concerned about the prefix mapping issue as sssom-py handles that - however, I still get your points 100%. To be honest, I just had a chat with @cmungall and he said what I also think: that we are barely (65%) towards the Prefix, now hybrid, solution. We can still be convinced otherwise. So its basically know about weighing the churn of having the interpret the first char of the predicate column as either a modifier or the first character of the actual prefix vs the easier readability and less error proneness for people reading the file trying to interpret the columns. I still feel that the risk of people ignoring the predicate_modifier column is too high for my taste. But lets keep raging.

cmungall · 2021-06-21T17:52:00Z

I think it needs to be idiotproof for inverse and negation. The risk of modifier being dropped for the reflexive/indirect/indirect case is far less severe.

…

On Mon, Jun 21, 2021 at 10:43 AM Nico Matentzoglu ***@***.***> wrote: Thank you both for the comments; there were other problems with the suggestion above, so I introduces a third option which we favour over the pure prefix one.. I am not too concerned about the prefix mapping issue as sssom-py handles that - however, I still get your points 100%. To be honest, I just had a chat with @cmungall <https://github.com/cmungall> and he said what I also think: that we are barely (65%) towards the Prefix, now hybrid, solution. We can still be convinced otherwise. So its basically know about weighing the churn of having the interpret the first char of the predicate column as either a modifier or the first character of the actual prefix vs the easier readability and less error proneness for people reading the file trying to interpret the columns. I still feel that the risk of people ignoring the predicate_modifier column is too high for my taste. But lets keep raging. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#40 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAMMOLW24YYDYUZM67C7HTTT524FANCNFSM4SGWVGMA> .

cthoyt · 2021-06-21T17:53:29Z

Could it be the case that we're trying to solve too many problems with a single column (and perhaps making a single vote)? Being positive/negative and being direct/indirect seem like problems that might be better to solve at separate times, rather than loading up a single blanket "modifiers" column. Maybe a solution where there is one column with a boolean value for each might make it more simple to address.

Disclosure: I'm not yet so motivated by the direct/indirect issue - I haven't been in a situation where I wanted to capture that and I'm missing the context for why it came up in discussion here. I'd be keen to learn more

I noticed that the sssom vocabulary introduces the superClassOf, which is an obvious inverse to the standard rdfs:subClassOf relationship. Is there a reason why this has been explicitly left out of rdfs? I saw other discussions where @cmungall had proposed other subproperties that are more descriptive of whether something is a "proper" subclass/superclass, but I didn't really get why that didn't make it to primetime. Maybe it's hard to keep all of these things "consistent"

matentzn · 2021-06-22T07:48:39Z

Vote: Capturing negation and inverse

Prefixes solution: 🚀

subject_id	relation_id	object_id	match_type
HP:001	!owl:subClassOf	MP:001	HumanCurated
HP:001	^owl:subClassOf	MP:001	HumanCurated

Separate modifier: 👍

subject_id	relation_id	object_id	match_type	predicate_modifier
HP:001	owl:subClassOf	MP:001	HumanCurated	NOT
HP:001	owl:subClassOf	MP:001	HumanCurated	INVERSE

Negated relations 👀

subject_id	relation_id	object_id	match_type
HP:001	sssom:notSubClassOf	MP:001	HumanCurated
HP:001	sssom:superClassOf	MP:001	HumanCurated

Arguments

	Separate modifier	Prefix solution	Negated relation solution
Conceptual	Is "cleaner", i.e. predicate_id column can be interpreted as CURIE	Could create a stronger dependency on specialised tooling (sssom-py).	Introduces new non-standard vocabulary.
Idempotent	Is not idempotent (tooling that is not migrated to consider new feature could produce faulty results.)	Is idempotent.	Is idempotent
Usability	Requires exploration of context.	Does not require exploration of context.	Does not require exploration of context

Note @cthoyt arguments that negative and inverse mappings occur rarely speaks IMO to both solutions - if they are so rare, then no specialised tooling is required either way.

I think from the discussion here it is pretty clear that we all agree that for the case of DIRECT, INDIRECT, REFLEXIVE we just use a bespoke predicate_modifier. Correct me if I am wrong by adding a 👎 along with your vote regarding the above.

matentzn · 2021-08-20T10:42:26Z

Another problem with any approach here is if you would want a double modification, like negative inverse.

matentzn · 2021-08-23T17:42:44Z

Look here for more discussions, where they went with what @cthoyt is suggesting: biolink/biolink-model#826

cmungall · 2021-09-02T18:50:51Z

I hate to force another vote but maybe we need a different kind of hybrid.

I think for inverted relations, it is cleanest to add predicates. The most unsatisfying thing here is that there is no community standard URI for subClassOf (equivalentTo is symmetric, and most other things are ObjectProperties in a vocabulary like RO). But we could easily add has_subclass to biolink.

My original ideal was for an expressive rdf path like syntax for arbitrary paths, but that seems overkill.

So I vote against ^

I think the real sticking point is negation, with really strong arguments for the three different ways of handling this

At the meeting we can discuss people's use cases for different combinations, here are mine:

not related to:
- very useful.
- note the semantics here should be to exclude exact, equivalent, close, related, broad, narrow
- I frequently want to provide information that there is no meaningful connection between two terms. This is often important information where there may be naive assumptions the two terms are related, e.g. based on their names
not equivalent, not exact match: somewhat useful
not subclass, not superclass: rarely useful

cthoyt · 2021-09-03T18:21:41Z

Further discussion was had in the SSSOM workshop at https://docs.google.com/document/d/1xUNUCXE-iAWJWgZwXdjq58hRAlRhMtaKHH6CeDWJWSw/edit?usp=sharing

matentzn · 2021-10-13T13:02:01Z

@cthoyt we should make a call on this before finalising the paper.

cthoyt · 2021-10-13T13:10:18Z

At the end of the discussion, I think we were all pretty much in agreement that we did not like the addition of new syntax, but would be happy with either of the two following solutions:

Introduce new relations that contain semantics about stuff being not true
Introduce a predicate modifier column for negations

At this point I think I'm leaning towards 2 the biolink-model group already chose this one at the end of discussion of biolink/biolink-model#826. Should we put it to a final vote?

matentzn · 2021-10-13T13:17:59Z

Vote: Should we allow a modifier column that will change the semantics of the mapping?

Arguments for an against in the ticket above.

👍 Yes, lets introduce the modifier column that allows us to say "NOT" and similar to modify the mapping relation
👎 No, lets stick with relations. If the relation like notRelatedTo does not exist, we have to define it somewhere

matentzn · 2021-10-13T13:18:57Z

I will vote a bit later, because I dont want to bias the vote with my own annoying position too much.

cthoyt · 2021-10-13T13:25:06Z

@matentzn when will the vote close?

if we mint new relations, I'd guess they initially would live in the sssom idspace

matentzn · 2021-10-13T13:26:31Z

Friday 22 October :)

graybeal · 2021-10-15T20:45:07Z

I'm late to the discussion and don't have time to do the deep dive on all the references. I will just say, I was really surprised that everyone is talking about and favoring (what seem to be) semantics-incompatible solutions to expressing a semantic relationship.

matentzn · 2021-10-16T04:42:28Z

The semantics part can be sorted out on the side of the model - this discussion is just about syntax! It's true though that there are some things that cannot be translated into RDF in this thread, like "indirect" or "not" in conjunction with an annotation property (at least it would be semantics free in the sense that no reasoner could detect it). But yes, how we deal with the mapping into RDF in a meaningful way is a concern, but it can be done.

graybeal · 2021-10-16T05:00:50Z

Right, I get that, but it's opaque to a non-expert. Well-named semantic relations are transparent, and using them in this context makes the SSMOC transparent.

matentzn · 2021-10-16T13:08:04Z

I also still have not cast my vote.. At the moment I am 51% - 49% on the 👎 side.. Still doing some soul searching

cthoyt · 2021-10-26T12:29:10Z

Thanks everybody for voting and engaging on this. I think it was a really good exercise having all of the discussions leading to the final vote, and while no solution is perfect, I think that this one will support a large variety of what people want to do. I'm going to edit the original issue text to reflect that we've decided on using a modifier column to the SSSOM standard, then this issue can get closed when that implementation in the model itself is realized.

matentzn · 2021-10-26T14:09:11Z

Thank you for organising and making PR @cthoyt :)

wdduncan · 2021-11-16T19:10:16Z

I was not unaware of this vote. But, I cast a 👎 for reasons that @graybeal cited.

matentzn · 2021-11-16T19:18:11Z

Too late guys, this is merged in in sssom now :P for better or worse!

wdduncan · 2021-11-16T19:20:44Z

@matentzn that's fine, I suppose :)
However, I am unsure what the final implementation looks like. Is it another column in the spreadsheet or are the modifiers placed in same field as the predicate?

Sorry for the confusion ...

matentzn · 2021-11-16T20:12:46Z

separate column! https://mapping-commons.github.io/sssom/Mapping/

matentzn added the priority label May 31, 2021

cthoyt mentioned this issue Jun 24, 2021

Add SSSOM Export biopragmatics/biomappings#45

Merged

6 tasks

matentzn added the workshop label Aug 20, 2021

matentzn changed the title ~~negated mappings~~ Negated mappings and the standardisation of mapping predicate modifiers Aug 20, 2021

matentzn mentioned this issue Aug 20, 2021

How should unmapped elements be indicated? #28

Closed

cthoyt mentioned this issue Oct 26, 2021

Add predicate modifier / change entity range to EntityReference over uriorcurie #99

Merged

matentzn closed this as completed in #99 Nov 16, 2021

baskaufs mentioned this issue Dec 14, 2021

Replace notes in subtype CV metadata with sawsdlrdf:modelReference links tdwg/ac#220

Closed

matentzn mentioned this issue May 25, 2022

FHIR ConceptMap equivalence / relationship mappings #185

Open

15 tasks

sierra-moxon mentioned this issue Oct 11, 2022

Should "negation" be represented as part of a predicate or as a qualifier of a core triple? biolink/biolink-model#1105

Closed

sharifX mentioned this issue Apr 29, 2024

how to describe unmatched or negated mappings or not applicable bge-barcoding/StayingMapped#3

Open

matentzn mentioned this issue Nov 2, 2024

Front page of SSSOM website desperately needs to list what the columns in an SSSOM file are/can be #392

Closed

Negated mappings and the standardisation of mapping predicate modifiers #40

Negated mappings and the standardisation of mapping predicate modifiers #40

Comments

cmungall commented Oct 7, 2020 • edited by cthoyt Loading

matentzn commented Oct 13, 2020

mellybelly commented Jan 5, 2021

matentzn commented Jun 6, 2021 • edited Loading

AlasdairGray commented Jun 7, 2021

matentzn commented Jun 7, 2021

matentzn commented Jun 7, 2021 • edited Loading

matentzn commented Jun 7, 2021

matentzn commented Jun 21, 2021 • edited Loading

Vote: Capturing predicate modifiers

Prefixes solution: 🚀

Separate modifier: 👍

EDIT: Hybrid 👀

cthoyt commented Jun 21, 2021 • edited Loading

bgyori commented Jun 21, 2021

cthoyt commented Jun 21, 2021

matentzn commented Jun 21, 2021

cmungall commented Jun 21, 2021 via email

cthoyt commented Jun 21, 2021

matentzn commented Jun 22, 2021

Vote: Capturing negation and inverse

Prefixes solution: 🚀

Separate modifier: 👍

Negated relations 👀

Arguments

matentzn commented Aug 20, 2021

matentzn commented Aug 23, 2021

cmungall commented Sep 2, 2021

cthoyt commented Sep 3, 2021

matentzn commented Oct 13, 2021

cthoyt commented Oct 13, 2021

matentzn commented Oct 13, 2021

Vote: Should we allow a modifier column that will change the semantics of the mapping?

matentzn commented Oct 13, 2021

cthoyt commented Oct 13, 2021 • edited Loading

matentzn commented Oct 13, 2021

graybeal commented Oct 15, 2021

matentzn commented Oct 16, 2021

graybeal commented Oct 16, 2021

matentzn commented Oct 16, 2021

cthoyt commented Oct 26, 2021

matentzn commented Oct 26, 2021

wdduncan commented Nov 16, 2021

matentzn commented Nov 16, 2021

wdduncan commented Nov 16, 2021

matentzn commented Nov 16, 2021

cmungall commented Oct 7, 2020 •

edited by cthoyt

Loading

matentzn commented Jun 6, 2021 •

edited

Loading

matentzn commented Jun 7, 2021 •

edited

Loading

matentzn commented Jun 21, 2021 •

edited

Loading

cthoyt commented Jun 21, 2021 •

edited

Loading

cthoyt commented Oct 13, 2021 •

edited

Loading