Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representing data model elements and their values in SSSOM #43

Open
matentzn opened this issue Oct 14, 2020 · 32 comments
Open

Representing data model elements and their values in SSSOM #43

matentzn opened this issue Oct 14, 2020 · 32 comments

Comments

@matentzn
Copy link
Collaborator

Case:

we want to map literal values in a set of permissible values in a data model (say "M" in gcs.maritalstatus) to some ontology term NCIT:Married. Now there might not be anything like an id for the permissible values in the data model. One solution could be to do something like _:b in the subject or object_id column to indicate this is a blank node, meaning that the label should be used for the mapping, but maybe we need something more general to allow for mapping arbitrary literals, as @wdduncan suggested in another thread as well..

From @hsolbrig:

We DO need a way to represent a specific string that appears in a given data element but I see this as a case where we are imposing the constraints of OUR data model (RDF)  on a pretty straight-forward idea – string “M” in the marital status data element in model M means “Married” as defined by ontology … .  I wonder whether this may be a case where we are overburdening the notion of an “ontology map”.   Note that using the actual map if it is in RDF is going to require the same amount of work whether the source nodes are represented as Uri’s or BNodes.
@wdduncan
Copy link

wdduncan commented Oct 15, 2020

I think being able to do this would be awesome!
Suppose we have a table with two three columns:

  • person_id
  • marital_status
  • gender
    In the marital_status column the value "m" means married, and in the gender column "m" means male. We also have an ontology "ont" that has classes to represent instances of people, marriage, genders.

Focusing just the columns we can do this (the is what we do for the NMDC mappings at least).

table.vocab:person_id skos:exactMatch ont:person
table.vocab:marital_status skos:exactMatch ont:marriage_status
table.vocab:gender skos:exactMatch ont:gender

Mapping specific values is trickier. I think we will need new predicates. E.g.:

table.vocab:marital_status sssom.vocab:value _:v1
_:v1 skosxl:literalForm "m"
_:v1 skos:exactMatch ont:married
table.vocab:gender sssom.vocab:value _:v2
_:v2 skosxl:literalForm "m"
_:v2 skos:exactMatch ont:male

I used blank nodes in this example, but I see no reason why they have to be blank.

Perhaps we can also make use of the match_string field. E.g.:

subject_id predicate_id object_id match_string
table.vocab:marital_status skos:exactMatch ont:married "m"
table.vocab:gender skos:exactMatch ont:male "m"
table.vocab:person_id skos:exactMatch ont:person

The match_string field is blank for for the person_id mapping, so we could apply to all. values.

@wdduncan
Copy link

@matentzn any more thoughts on this? Add @cmungall to the converstation.

@cmungall
Copy link
Contributor

Good comments, will respond in more detail later

one comment: match_string isn't really intended for this. It's for provenance for a lexical match

@cmungall
Copy link
Contributor

Related ticket on how to map strings/enums to ontology terms in blml: https://github.com/biolink/biolinkml/issues/170

@ddooley
Copy link

ddooley commented Oct 27, 2020

https://schema.org/codeValue would be appropriate for indicating a string match?

@wdduncan
Copy link

@ddooley I suppose it depends what is meant by 'code'. Does it have to relate to a code system? E.g., ICD 10 code 'F33.1' represents 'Major depressive disorder, recurrent, moderate'. For many enums this may make sense, even though the "coding system" may only apply to a particular study. E.g., In my study, I choose to use "99" to represent unknown values.

But we may want to represent literal names of things, e.g., the strings 'morning star' and 'evening star' denote the planet Venus.

@ddooley
Copy link

ddooley commented Oct 27, 2020

Looks like the schema intangible https://schema.org/DefinedTerm would be a more general term (like an OWL class) spelled out in full via "name" property, and with "termCode" as a sub-property for an associated code. Thx for drawing my attention to that.

@wdduncan
Copy link

Sure, I suppose DefinedTerm works. Although, you still need to link the literal to the IRI that specifies its meaning.

There are some standard properties for linking objects to literals: rdf:value, owl:hasValue. The range can be either an IRI or a literal. E.g:

table.vocab:marital_status owl:hasValue  _:v1 .
_:v1 owl:hasValue "m" .
_:v1 skoslx:exactMatch ont:married .

table.vocab:marital_status owl:hasValue_:v2 .
_:v2 owl:hasValue "s" .
_:v2 skos:exactMatch ont:single .

In SKOS, skosxl:literalForm is an option too. If we adhere to SKOS standards, the modeling would be like this:

table.vocab:marital_status skosxl:prefLabel _:v1 .
_:v1 a skosxl:Label .
_:v1 skoslx:literalForm "m" .
_:v1 skos:exactMatch ont:married .

table.vocab:marital_status skosxl:prefLabel _:v2 .
_:v2 a skosxl:Label .
_:v2 skosxl:literalForm "s" .
_:v2 skos:exactMatch ont:single .

Although this is a bit more verbose, it at least stays within the SKOS ecosystem. The statements _:v1 a skosxl:Label and _:v2 a skosxl:Label can perhaps be excluded.

I am a little wary of schema.org. They don't seem to be focused on ontology, but mostly focused on stings.
Using DefinedTerm, we would still need to link the literal to the IRI that specifies its meaning. I supposed the modeling could look something like:

table.vocab:marital_status owl:hasValue  _:v1 .
_:v1 schema:DefinedTerm "m" .
_:v1 skos:exactMatch ont:married .

table.vocab:marital_status owl:hasValue_:v2 .
_:v2 schema:DefinedTerm "s" .
_:v2 skos:exactMatch ont:single .

Is this any better or worse? You have to make use of owl:hasValue, schema:DefinedTerm, and skos:exactMatch, which may bother some.

@wdduncan
Copy link

has anymore consideration been given to this?

@wdduncan
Copy link

wdduncan commented Jun 8, 2021

Did we reach a consensus on this? It is quite important. In the NMDC we are using enums to control the meanings of values, but using an SSSOM file to provide mappings/means for GOLD column names. It would be great if we could provide meanings/mappings for individual values in a SSSOM file.

@matentzn
Copy link
Collaborator Author

matentzn commented Jun 8, 2021

I think its pretty much clear that we allow complex expressions instead of subject_id, object_id. What I am not clear yet is how these should be represented - that's why I stopped thinking because your use case differs a lot from mine. I definitely do not want some weird concept like "named blank nodes" - In this case better mint an ID. What I prefer is patterns and complex expressions associated with a subject_id field. I prefer to organise a meeting for this and discuss it - this ticket has become to confusing for me. What do you thinking? Maybe we can just meet between the two of us and hash out a good solution?

@matentzn
Copy link
Collaborator Author

Value set values such as table.cv.martital_status#M to indicate Married need to be identified objects. In the workshop, we need to decide how to represent these in a way that is non-confusing:

  • table.cv:martital_status=M
  • table.cv.martital_status:M
  • etc etc.

@matentzn
Copy link
Collaborator Author

Note that both table.cv:martital_status=M and table.cv:martital_status will often be mapped to ontology classes!

@wdduncan
Copy link

Yes. Literal values are often mapped to ontology classes. We just need a way to connect literal values to classes in SSSOM.

Some hacky ideas that come to mind:

  • Add (yet) another column to hold literal values.
  • modify the syntax to allow for laterals, as you suggest: table.cv:martital_status=M skos:exactMatch ont:single
    Or some other syntactic convention such as that somewhat mimic how parameters are passed in URL query strings:
    table.cv:martital_status?M (i.e. using the literal as a param)
    table.cv:?martital_status=M (i.e., using the the field name as param with literal as the value)
    table.cv:martital_status?value=M (i.e, just having a special param named value)
    (note, some lateral values will have more than one word -- e.g. 'chronic fatigue')

@matentzn
Copy link
Collaborator Author

Yeah, I think this is the right direction. We will discuss this at the workshop. Thanks @wdduncan

@matentzn matentzn changed the title Should we allow blank nodes in place of subject or object id? Representing data model elements and their values in SSSOM Aug 26, 2021
@matentzn
Copy link
Collaborator Author

matentzn commented Sep 1, 2021

When this ticket is approached at the workshop, please consider solely the problem of mapping data model elements and data model element instances (be they literals or controlled terms). See here for a use case that covers both aspects. The question of more complex mappings (these blank node examples @wdduncan describes here) belong in a different ticket (#61).

@cmungall
Copy link
Contributor

cmungall commented Sep 3, 2021

note. from meeting

@wdduncan's proposal:

  • subject_id = table.cv:martital_status=M
  • predicate_id = skos:exactMatch ont:single
  • object_id = ont:single

@cmungall
Copy link
Contributor

cmungall commented Sep 3, 2021

Syntactic alternative:

{enum.curie}#{URLENCODE(permissible_value)}

@ShahimEssaid points out this. may have implications for services, e.g. hash ignored

@matentzn
Copy link
Collaborator Author

matentzn commented May 2, 2022

Here an example how fhir does it: https://hl7.org/fhir/conceptmap-example.ttl.html

@wdduncan
Copy link

wdduncan commented May 2, 2022

@matentzn There are a lot of turtle statements on the page. I'm not seeing one that you translate into an SSSOM structure. Can you point it out?

@matentzn
Copy link
Collaborator Author

matentzn commented May 2, 2022

         fhir:ConceptMap.group.element.target.code [ fhir:value "BAD" ];

To refer to the datamodel of "conceptmap",the submodel, "group.element.target.code" and the value BAD. I just stumbled across this. For SSSOM, this could be referred to like fhir:ConceptMap.group.element.target.code#BAD, not saying its good, just saw this and wanted to let everyone know.

@wdduncan
Copy link

wdduncan commented May 2, 2022

Thanks @matentzn I didn't see what part might be of use. I the turtle statement:

         fhir:ConceptMap.group.element.target.code [ fhir:value "BAD" ];

Isn't fhir:ConceptMap.group.element.target.code a predicate? If so, I suppose attaching a value to the predicate could work. Seems similar to what @cmungall proposed above. Although (perhaps likely), I may be not understanding something.

@matentzn
Copy link
Collaborator Author

matentzn commented May 2, 2022

It is indeed a predicate, you understand correctly ;) I only mentioned this as an example of how someone might want to refer to a data model element using a CURIE; seems similar to what @cmungall proposed, agreed!

@wdduncan
Copy link

wdduncan commented May 4, 2023

I was on a recent OMOP call and the topic of using SSSOM was mentioned. In order for SSSOM to work well in this context, a standard mapping for literal values to the meaning(s) of the literal value is needed.

So, was any standard agreed on? (I haven't been following this thread for a while.)

cc @AEW0330 @stephanieshong

@matentzn
Copy link
Collaborator Author

matentzn commented May 4, 2023

Thanks @wdduncan to pinging this issue. We just had an entire workshop dedicated to this question:

https://mapping-commons.github.io/sssom/events/mc2023/

We have two conflicting proposals right now for SSSOM profiles that deal with literals (e.g. #235), but I think that in the case that the literal is a controlled value (an enum in a data model) we only need to agree on a convention to describe the location of an element in a datamodel. Basically a mini URI standard to identify an element in a datamodel, similar to the FHIR example you give above.

What would help always to push the dialogue towards a solution is example, concrete ones. Can you share SSSOM files where the need becomes apparent?

@wdduncan
Copy link

wdduncan commented May 4, 2023

@matentzn I can't share specific files b/c they have patient information in them. In general, a dental record will include information like this:

patient_id service_code tooth surface
10001 2391 3 O

In this case, the values of the service_code, tooth, and surface fields would need semantic mappings. The patient field can be handled at column level.

service_code 2392: single surface resin-based composite restoration
tooth 3: upper right first molar
surface O: occlusal surface of the tooth that was restored

@matentzn
Copy link
Collaborator Author

matentzn commented May 4, 2023

Ok this makes sense. However, you will need a way to reference the table schema in a globally unique fashion. Is there a formal schema you could refer to? Something like:

TABLESCHEMA:tooth#3 skos:exactMatch UBERON:123

@wdduncan
Copy link

wdduncan commented May 4, 2023

table schema in a globally unique fashion

Not sure what you are referring to here. Some software vendors make there schemas publicly available, but many do not.

In the example I gave above, The service codes come for the ADA's Code on Dental Procedures, and the tooth number comes form the Universal Numbering System. I have no idea where the nomenclature for tooth surfaces comes from. It is just something that I see a lot.

In any case, I can mint the necessary IRIs. Has the design pattern been agreed upon? I.e.:

<schema IRI>#<literal value> <match type> <ontology IRI>

There have been many proposals.

@matentzn
Copy link
Collaborator Author

matentzn commented May 4, 2023

No and even after significant googeling I think it is not obvious if there is one right way to do this.

SCHEMA:tooth#3 skos:exactMatch UBERON:123

We are looking into using a JSON URL standard this right now (https://jsonurl.github.io/), because it is much more flexible for complex mappings:

SCHEMA:(tooth:'3')

Which also permits stuff like:

SCHEMA:(tooth:foo+bar)

We could then use the exact same convention for complex mappings (SCHEMA:(disease:'MONDO:123',modifier:'PATO:123')) and data model mappings where the values are literals.

I tried to promote your idea with URL parameters, and while I still think we could muscle it through, there was quite a bit of moaning about this at the workshop (I still like it).

What do you think of this? One cool feature of this would be that the ID of the data model element could be potentially complex, like:

SCHEMA:(person.demographics.address:my+street)

EDIT:

Seems ChatGPT still favours URI encoding:

https://example.com/schema/person.demographics.address#my%20street

or in CURIE:

SCHEMA:person.demographics.address#my%20street

EDIT 2: I AM TORN. I don't know what the right solution is!

@wdduncan
Copy link

wdduncan commented May 4, 2023

I tried to promote your idea with URL parameters, and while I still think we could muscle it through, there was quite a bit of moaning about this at the workshop (I still like it).

Oh well ... you win some, you lose some :)

The JSON URL standard is quite interesting. I was not aware of it!

@wdduncan
Copy link

wdduncan commented May 4, 2023

EDIT 2: I AM TORN. I don't know what the right solution is!

So am I! One slight worry about using # is that # might actually be in the literal value (e.g., Apartment #3). I suppose the # would need to be URL encoded in such cases.

Here is a thought (just putting it out there):
We can make use of owl:hasValue within the JSON URL. E.g: SCHEMA.COLUMN(owl.hasValue:'3').

Or if owl:hasValue is too OWL centric, perhaps reserve a SSSOM IRI for defining values. E.g.: SCHEMA.COLUMN(sssom.value:'3').

I looked for way to define values in SKOS, but didn't find one. I may have overlooked something in SKOS, though.

@AEW0330
Copy link

AEW0330 commented Jun 15, 2023

@p-talapova do you want to add a link to a description of work you're leading with @matentzn in the CRITICAL and Bridge2AI studies here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants