Add enums and label->id mappings #37

cmungall · 2020-06-27T00:14:08Z

We have

  values_from:
    domain: definition
    multivalued: true
    range: uriorcurie
    description: >-
      the identifier of a "value set" -- a set of identifiers that form the possible values for the range of a slot

(aside: this is declared at the definition level, should be a slot property)

We should have the equiv for string values, i.e an enum.

e.g

slots:
  evidence code:
    enum:
      - IEA
      - ISS
...

it would be good to specify mappings for each of these, perhaps:

slots:
  evidence code:
    enum:
      IEA: ECO:nnnn
      ISS: ECO:nnn
...

or perhaps a more expressive:

slots:
  evidence code:
    enum:
      IEA:
        id: ECO:nnnn
        description: ...
      ISS: 
         id: ECO:nnn
         description: ...
...

mapping to a json-ld context should be obvious

The text was updated successfully, but these errors were encountered:

wdduncan · 2020-11-03T18:22:39Z

Do you want add enums under a mapping descriptor? e.g.:

slots:
  evidence code:
    mappings:
      enum:
        IEA:
          id: ECO:nnnn
          description: ...
        ISS: 
           id: ECO:nnn
           description: ...

A complication may be cases in which to say something about the all values in a field (e.g., a field hold glucoses results) , but certain values mean unknown or inconclusive (e.g., the value 9999 means unknown).

cmungall · 2020-11-03T20:00:31Z

this violates the current range constraint for mappings, but I think the same thing could be accommodated with a new field, e.g. value_mappings

or this could be embedded inside each enum element

slots:
  evidence code:
      enum:
        IEA:
          exact_mappings:
            - ECO:nnnn
        ISS: 
           exact_mappings:
            - ECO:nnn

hsolbrig · 2020-11-05T15:50:05Z

Referencing:

We've got three components that we need to represent:

What values (strings) that are allowable in a slot (e.g. "IEA", "ISS", ...). WRT to the above diagram, these would constitute Permissible_Value's with the container being an instance of an Enumerated_valueDomain.
The permissible_value_meaning (pvm) links into a collection of possible Value_Meanings
The Enumerated_Conceptual_Domain (ECD) that contains all of the possible Value_Meanings -- note that an ECD can be a superset of the pvm links.

My understanding of the current values_from attribute is intended to represent the value_domain_meaning link. Correct?

I'm inclined separate the EVD from the slot, as it would allow separate maintenance and reuse, so here is a first cut at a proposal:

Add a fifth item to the SchemaDefinition:

id: http://example.org/something
...
subsets:
    ...
types:
   ...
slots:
    ...
    range: evidence

classes:
    ...
enums:
    evidence:
        description: Permissible values for CLUE evidence fragments
           ...  :   (Same set of metadata as classes, etc)
        values_from: VD:evidence_codes
        permissible_values:
             IEA: Colonel Mustard was in the Ballroom
             ISS:
                 description: Mrs. Peacock with the Dagger
                 meaning: CLUE:nnnn

The tccm could focus on fleshing out the definition and resolution of the values_from link and we could add a validator component that verifies that all meaning links are members of the ECD (target of values_from)

hsolbrig · 2020-11-06T14:56:55Z

We may want to add another element that identifies how the permissible value could algorithmically be generated from the values_from (value domain) codes (e.g. code, URI, CURIE, ...)

wdduncan · 2020-11-06T17:50:54Z

@hsolbrig Perhaps I'm not following, but your proposal seems to require a value domain before we can relate a value to its meaning. E.g.:
"s" -> ont:marriage_status_domain -> marriage_ont:single

Are you not allowing more direct relations? E.g.:
"s" -> marriage_ont:single

The ontology "marriage_ont" would hold information about with the term (marriage_ont:single) was a member of some kind value domain/set within the ontology. Do you think requiring the value domain/set adds meaning about the meaning "s" not captured by the reference to the class marriage_ont:single? The information that marriage_ont:single is a member of some specific value domain/set seems like a different fact to capture. Do you have a use case you are wanting to model?

Also, I am wondering about separating out the permissible values from the permissible meanings. It is not wrong, it is just verbose, so I am wondering about the use case here. Suppose, I have my permissible values for the marriage status field:

enums:
    marriage status:
        permissible_values:
             s: 
                description: single
                uriorcuri: marriage_ont:nnnnn
             m:
                 description: married
                 uriorcuri: marriage_ont:nnnnn

Do I also need to have a set of permissible meanings?

hsolbrig · 2020-11-06T18:57:53Z

I went over this this morning w/ Dazhi and will have a more complete proposal. As a quick summary:

If you don't supply a values_from link, you are free to assemble things as you see fit:

enums:
      marital status:
             description:  possible marriage value codes w/ optional value meanings
             notes:
                 - note that s: single and s: \n description: single are the same
             permissible_values:
                    s: single
                    m: 
                      description: married
                      meaning: marriage_ont:117338
                   d:
                     description: divorced
                     comments:
                       - here so show that permissible values are `elements`, so can have all accompanying metadata

If you do supply a values_from link, you are referencing a code set, tagged or explicit version

enums:
       marital status:
              description: codes drawn from the HL7 marital status code system
              values_from: HL7:v2_marital_status
              permissible_values:
                 1:
                    description: "1" means single
                    meaning: V2MS:s
                 2:
                    meaning: V2MS:m
                 3:
                    description: Married on even numbered days
                    comments: You can still add PV's that don't map

Referencing code sets
We propose the following:
- Establish a namespace of prefixes -- "CS" or similar
- Agree on a community wide set of prefix maps pulling from prefixcommons, prefix.cc and other places
- For values_from, assert that CS:HPO stands for all codes in the ontology
- Drawing from CTS2, establish a version control tagging system, with ONE predefined tag: "current", that represents
  whatever the service provider believes to be the default version (not "latest" because one often does NOT want to
  use the latest greatest in production settings)
  - no tag and no version -- use the "current" tag of the target code set / ontology
  - tag -- use an explicit tag (e.g. "tag": "devel") (Tags assigned to a version by community / service).
  - version -- use a named version (e.g. "version": "1.7.0")
    - note: We could also consider a relative version type thing similar to that used by pip ("~=1.7")
- Assume that Code system version identifiers are, at bare minimum, lexically ordered -- alphabetically later version identifiers are temporally later as well -- and, ideally, follow semver.
- One can either explicate permissible values or define a map
  - CODE -- permissible value is the code
  - CURIE -- permissible value is the curie
  - URI -- permissible value is the URI

We can then define our own maps if needed:

enums:
     phenotype:
           description: disease code
           values_from: CS:HPO
           tag:  "devel"                        <---- if you need to stay with an externally tagged resource --or--
           version: "~1.7"                    <---- The current version as long as it has a minor of "7"
           permissible_values:
                1:
                  description: aplasia/hypoplasia of extremeties
                  meaning: HP:0009815
               2:
                  meaning: HP:0001218

Or define an auto mapping:

enums:
    relationship_code:
          description: Any code in the relations ontology
          values_from: CS:RO
          use: CODE

     disease:
         description: the CURIE of any code in the scary diseases code set
         values_from: NS1:scary_diseases
         use: CURIE

Will have a more complete proposal w/ partially running code available in a few days

cmungall · 2020-12-18T00:19:14Z

The proposal sounds great

cmungall · 2021-02-09T01:34:30Z

can we close this?

wdduncan · 2021-02-17T21:58:55Z

Perhaps a little late in the game to be commenting on this, but should we use the slot name permissible_values? This seems (to me at least) that only the values listed are permitted. But, "in the wild" you encounter all sorts of values that aren't given the data dictionary. Would it be better to name this slot defined values?

hsolbrig · 2021-02-17T23:20:35Z

Interesting -- at the moment, the behavior is "permissible". If you list "A", "B, "C" as the three permissible values, "D" will be treated as an error. How do you envision this component working in tandem with "in the wild" data?

wdduncan · 2021-02-18T00:57:19Z

Suppose, I have a slot foo defined as:

foo:
  - range: string

with permissible values A, B, C. If I encounter value D, in one sense, you are right. I will want to throw an error. However, what about if I am aware that I may not have mappings (i.e, "meanings") for all the values of foo? I don't think I would necessarily want to thrown an error. I may still want to have the data transformed (e.g., foo: D), although I haven't specified D as being "permissible". Adding D as an enum may happen later in time as the schema develops and more information is gathered.

Make sense?

cmungall · 2021-06-10T00:07:55Z

I think we can close this now? We have separate tickets for specific aspects, e.g. markdowngen, jsonschemagen

wdduncan · 2021-06-10T00:17:42Z

I think it can be closed.

hsolbrig self-assigned this Jul 13, 2020

hsolbrig transferred this issue from biolink/biolinkml Mar 25, 2021

cmungall mentioned this issue Mar 25, 2021

Add owlgen support for enums #113

Closed

wdduncan mentioned this issue Mar 26, 2021

Extend enum model to allow open vs closed, extensional vs intensional #127

Open

hsolbrig closed this as completed Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add enums and label->id mappings #37

Add enums and label->id mappings #37

cmungall commented Jun 27, 2020

wdduncan commented Nov 3, 2020

cmungall commented Nov 3, 2020

hsolbrig commented Nov 5, 2020

hsolbrig commented Nov 6, 2020

wdduncan commented Nov 6, 2020

hsolbrig commented Nov 6, 2020

cmungall commented Dec 18, 2020

cmungall commented Feb 9, 2021

wdduncan commented Feb 17, 2021

hsolbrig commented Feb 17, 2021

wdduncan commented Feb 18, 2021

cmungall commented Jun 10, 2021

wdduncan commented Jun 10, 2021

Add enums and label->id mappings #37

Add enums and label->id mappings #37

Comments

cmungall commented Jun 27, 2020

wdduncan commented Nov 3, 2020

cmungall commented Nov 3, 2020

hsolbrig commented Nov 5, 2020

hsolbrig commented Nov 6, 2020

wdduncan commented Nov 6, 2020

hsolbrig commented Nov 6, 2020

cmungall commented Dec 18, 2020

cmungall commented Feb 9, 2021

wdduncan commented Feb 17, 2021

hsolbrig commented Feb 17, 2021

wdduncan commented Feb 18, 2021

cmungall commented Jun 10, 2021

wdduncan commented Jun 10, 2021