-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Representing data model elements and their values in SSSOM #43
Comments
I think being able to do this would be awesome!
Focusing just the columns we can do this (the is what we do for the NMDC mappings at least).
Mapping specific values is trickier. I think we will need new predicates. E.g.:
I used blank nodes in this example, but I see no reason why they have to be blank. Perhaps we can also make use of the
The |
Good comments, will respond in more detail later one comment: match_string isn't really intended for this. It's for provenance for a lexical match |
Related ticket on how to map strings/enums to ontology terms in blml: https://github.com/biolink/biolinkml/issues/170 |
https://schema.org/codeValue would be appropriate for indicating a string match? |
@ddooley I suppose it depends what is meant by 'code'. Does it have to relate to a code system? E.g., ICD 10 code 'F33.1' represents 'Major depressive disorder, recurrent, moderate'. For many enums this may make sense, even though the "coding system" may only apply to a particular study. E.g., In my study, I choose to use "99" to represent unknown values. But we may want to represent literal names of things, e.g., the strings 'morning star' and 'evening star' denote the planet Venus. |
Looks like the schema intangible https://schema.org/DefinedTerm would be a more general term (like an OWL class) spelled out in full via "name" property, and with "termCode" as a sub-property for an associated code. Thx for drawing my attention to that. |
Sure, I suppose There are some standard properties for linking objects to literals:
In SKOS, skosxl:literalForm is an option too. If we adhere to SKOS standards, the modeling would be like this:
Although this is a bit more verbose, it at least stays within the SKOS ecosystem. The statements I am a little wary of schema.org. They don't seem to be focused on ontology, but mostly focused on stings.
Is this any better or worse? You have to make use of |
has anymore consideration been given to this? |
Did we reach a consensus on this? It is quite important. In the NMDC we are using enums to control the meanings of values, but using an SSSOM file to provide mappings/means for GOLD column names. It would be great if we could provide meanings/mappings for individual values in a SSSOM file. |
I think its pretty much clear that we allow complex expressions instead of subject_id, object_id. What I am not clear yet is how these should be represented - that's why I stopped thinking because your use case differs a lot from mine. I definitely do not want some weird concept like "named blank nodes" - In this case better mint an ID. What I prefer is patterns and complex expressions associated with a subject_id field. I prefer to organise a meeting for this and discuss it - this ticket has become to confusing for me. What do you thinking? Maybe we can just meet between the two of us and hash out a good solution? |
Value set values such as
|
Note that both |
Yes. Literal values are often mapped to ontology classes. We just need a way to connect literal values to classes in SSSOM. Some hacky ideas that come to mind:
|
Yeah, I think this is the right direction. We will discuss this at the workshop. Thanks @wdduncan |
When this ticket is approached at the workshop, please consider solely the problem of mapping data model elements and data model element instances (be they literals or controlled terms). See here for a use case that covers both aspects. The question of more complex mappings (these blank node examples @wdduncan describes here) belong in a different ticket (#61). |
note. from meeting @wdduncan's proposal:
|
Syntactic alternative:
@ShahimEssaid points out this. may have implications for services, e.g. hash ignored |
Here an example how fhir does it: https://hl7.org/fhir/conceptmap-example.ttl.html |
@matentzn There are a lot of turtle statements on the page. I'm not seeing one that you translate into an SSSOM structure. Can you point it out? |
To refer to the datamodel of "conceptmap",the submodel, "group.element.target.code" and the value BAD. I just stumbled across this. For SSSOM, this could be referred to like |
Thanks @matentzn I didn't see what part might be of use. I the turtle statement:
Isn't |
It is indeed a predicate, you understand correctly ;) I only mentioned this as an example of how someone might want to refer to a data model element using a CURIE; seems similar to what @cmungall proposed, agreed! |
I was on a recent OMOP call and the topic of using SSSOM was mentioned. In order for SSSOM to work well in this context, a standard mapping for literal values to the meaning(s) of the literal value is needed. So, was any standard agreed on? (I haven't been following this thread for a while.) |
Thanks @wdduncan to pinging this issue. We just had an entire workshop dedicated to this question: https://mapping-commons.github.io/sssom/events/mc2023/ We have two conflicting proposals right now for SSSOM profiles that deal with literals (e.g. #235), but I think that in the case that the literal is a controlled value (an enum in a data model) we only need to agree on a convention to describe the location of an element in a datamodel. Basically a mini URI standard to identify an element in a datamodel, similar to the FHIR example you give above. What would help always to push the dialogue towards a solution is example, concrete ones. Can you share SSSOM files where the need becomes apparent? |
@matentzn I can't share specific files b/c they have patient information in them. In general, a dental record will include information like this:
In this case, the values of the service_code, tooth, and surface fields would need semantic mappings. The patient field can be handled at column level. service_code 2392: single surface resin-based composite restoration |
Ok this makes sense. However, you will need a way to reference the table schema in a globally unique fashion. Is there a formal schema you could refer to? Something like:
|
Not sure what you are referring to here. Some software vendors make there schemas publicly available, but many do not. In the example I gave above, The service codes come for the ADA's Code on Dental Procedures, and the tooth number comes form the Universal Numbering System. I have no idea where the nomenclature for tooth surfaces comes from. It is just something that I see a lot. In any case, I can mint the necessary IRIs. Has the design pattern been agreed upon? I.e.:
There have been many proposals. |
No and even after significant googeling I think it is not obvious if there is one right way to do this.
We are looking into using a JSON URL standard this right now (https://jsonurl.github.io/), because it is much more flexible for complex mappings:
Which also permits stuff like:
We could then use the exact same convention for complex mappings ( I tried to promote your idea with URL parameters, and while I still think we could muscle it through, there was quite a bit of moaning about this at the workshop (I still like it). What do you think of this? One cool feature of this would be that the ID of the data model element could be potentially complex, like:
EDIT: Seems ChatGPT still favours URI encoding:
or in CURIE:
EDIT 2: I AM TORN. I don't know what the right solution is! |
Oh well ... you win some, you lose some :) The JSON URL standard is quite interesting. I was not aware of it! |
So am I! One slight worry about using Here is a thought (just putting it out there): Or if I looked for way to define values in SKOS, but didn't find one. I may have overlooked something in SKOS, though. |
@p-talapova do you want to add a link to a description of work you're leading with @matentzn in the CRITICAL and Bridge2AI studies here? |
Case:
we want to map literal values in a set of permissible values in a data model (say "M" in gcs.maritalstatus) to some ontology term
NCIT:Married
. Now there might not be anything like an id for the permissible values in the data model. One solution could be to do something like_:b
in the subject or object_id column to indicate this is a blank node, meaning that the label should be used for the mapping, but maybe we need something more general to allow for mapping arbitrary literals, as @wdduncan suggested in another thread as well..From @hsolbrig:
The text was updated successfully, but these errors were encountered: