Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enums and label->id mappings #37

Closed
cmungall opened this issue Jun 27, 2020 · 13 comments
Closed

Add enums and label->id mappings #37

cmungall opened this issue Jun 27, 2020 · 13 comments
Assignees

Comments

@cmungall
Copy link
Member

We have

  values_from:
    domain: definition
    multivalued: true
    range: uriorcurie
    description: >-
      the identifier of a "value set" -- a set of identifiers that form the possible values for the range of a slot

(aside: this is declared at the definition level, should be a slot property)

We should have the equiv for string values, i.e an enum.

e.g

slots:
  evidence code:
    enum:
      - IEA
      - ISS
...

it would be good to specify mappings for each of these, perhaps:

slots:
  evidence code:
    enum:
      IEA: ECO:nnnn
      ISS: ECO:nnn
...

or perhaps a more expressive:

slots:
  evidence code:
    enum:
      IEA:
        id: ECO:nnnn
        description: ...
      ISS: 
         id: ECO:nnn
         description: ...
...

mapping to a json-ld context should be obvious

@hsolbrig hsolbrig self-assigned this Jul 13, 2020
@wdduncan
Copy link
Contributor

wdduncan commented Nov 3, 2020

Do you want add enums under a mapping descriptor? e.g.:

slots:
  evidence code:
    mappings:
      enum:
        IEA:
          id: ECO:nnnn
          description: ...
        ISS: 
           id: ECO:nnn
           description: ...

A complication may be cases in which to say something about the all values in a field (e.g., a field hold glucoses results) , but certain values mean unknown or inconclusive (e.g., the value 9999 means unknown).

@cmungall
Copy link
Member Author

cmungall commented Nov 3, 2020

this violates the current range constraint for mappings, but I think the same thing could be accommodated with a new field, e.g. value_mappings

or this could be embedded inside each enum element

slots:
  evidence code:
      enum:
        IEA:
          exact_mappings:
            - ECO:nnnn
        ISS: 
           exact_mappings:
            - ECO:nnn

@hsolbrig
Copy link
Contributor

hsolbrig commented Nov 5, 2020

Referencing:

image

We've got three components that we need to represent:

  1. What values (strings) that are allowable in a slot (e.g. "IEA", "ISS", ...). WRT to the above diagram, these would constitute Permissible_Value's with the container being an instance of an Enumerated_valueDomain.
  2. The permissible_value_meaning (pvm) links into a collection of possible Value_Meanings
  3. The Enumerated_Conceptual_Domain (ECD) that contains all of the possible Value_Meanings -- note that an ECD can be a superset of the pvm links.

My understanding of the current values_from attribute is intended to represent the value_domain_meaning link. Correct?

I'm inclined separate the EVD from the slot, as it would allow separate maintenance and reuse, so here is a first cut at a proposal:

Add a fifth item to the SchemaDefinition:

id: http://example.org/something
...
subsets:
    ...
types:
   ...
slots:
    ...
    range: evidence

classes:
    ...
enums:
    evidence:
        description: Permissible values for CLUE evidence fragments
           ...  :   (Same set of metadata as classes, etc)
        values_from: VD:evidence_codes
        permissible_values:
             IEA: Colonel Mustard was in the Ballroom
             ISS:
                 description: Mrs. Peacock with the Dagger
                 meaning: CLUE:nnnn

The tccm could focus on fleshing out the definition and resolution of the values_from link and we could add a validator component that verifies that all meaning links are members of the ECD (target of values_from)

@hsolbrig
Copy link
Contributor

hsolbrig commented Nov 6, 2020

We may want to add another element that identifies how the permissible value could algorithmically be generated from the values_from (value domain) codes (e.g. code, URI, CURIE, ...)

@wdduncan
Copy link
Contributor

wdduncan commented Nov 6, 2020

@hsolbrig Perhaps I'm not following, but your proposal seems to require a value domain before we can relate a value to its meaning. E.g.:
"s" -> ont:marriage_status_domain -> marriage_ont:single

Are you not allowing more direct relations? E.g.:
"s" -> marriage_ont:single

The ontology "marriage_ont" would hold information about with the term (marriage_ont:single) was a member of some kind value domain/set within the ontology. Do you think requiring the value domain/set adds meaning about the meaning "s" not captured by the reference to the class marriage_ont:single? The information that marriage_ont:single is a member of some specific value domain/set seems like a different fact to capture. Do you have a use case you are wanting to model?

Also, I am wondering about separating out the permissible values from the permissible meanings. It is not wrong, it is just verbose, so I am wondering about the use case here. Suppose, I have my permissible values for the marriage status field:

enums:
    marriage status:
        permissible_values:
             s: 
                description: single
                uriorcuri: marriage_ont:nnnnn
             m:
                 description: married
                 uriorcuri: marriage_ont:nnnnn

Do I also need to have a set of permissible meanings?

@hsolbrig
Copy link
Contributor

hsolbrig commented Nov 6, 2020

I went over this this morning w/ Dazhi and will have a more complete proposal. As a quick summary:

  1. If you don't supply a values_from link, you are free to assemble things as you see fit:
enums:
      marital status:
             description:  possible marriage value codes w/ optional value meanings
             notes:
                 - note that s: single and s: \n description: single are the same
             permissible_values:
                    s: single
                    m: 
                      description: married
                      meaning: marriage_ont:117338
                   d:
                     description: divorced
                     comments:
                       - here so show that permissible values are `elements`, so can have all accompanying metadata
                     
  1. If you do supply a values_from link, you are referencing a code set, tagged or explicit version
enums:
       marital status:
              description: codes drawn from the HL7 marital status code system
              values_from: HL7:v2_marital_status
              permissible_values:
                 1:
                    description: "1" means single
                    meaning: V2MS:s
                 2:
                    meaning: V2MS:m
                 3:
                    description: Married on even numbered days
                    comments: You can still add PV's that don't map
  1. Referencing code sets
    We propose the following:
    • Establish a namespace of prefixes -- "CS" or similar
    • Agree on a community wide set of prefix maps pulling from prefixcommons, prefix.cc and other places
    • For values_from, assert that CS:HPO stands for all codes in the ontology
    • Drawing from CTS2, establish a version control tagging system, with ONE predefined tag: "current", that represents
      whatever the service provider believes to be the default version (not "latest" because one often does NOT want to
      use the latest greatest in production settings)
      • no tag and no version -- use the "current" tag of the target code set / ontology
      • tag -- use an explicit tag (e.g. "tag": "devel") (Tags assigned to a version by community / service).
      • version -- use a named version (e.g. "version": "1.7.0")
        • note: We could also consider a relative version type thing similar to that used by pip ("~=1.7")
    • Assume that Code system version identifiers are, at bare minimum, lexically ordered -- alphabetically later version identifiers are temporally later as well -- and, ideally, follow semver.
    • One can either explicate permissible values or define a map
      • CODE -- permissible value is the code
      • CURIE -- permissible value is the curie
      • URI -- permissible value is the URI

We can then define our own maps if needed:

enums:
     phenotype:
           description: disease code
           values_from: CS:HPO
           tag:  "devel"                        <---- if you need to stay with an externally tagged resource --or--
           version: "~1.7"                    <---- The current version as long as it has a minor of "7"
           permissible_values:
                1:
                  description: aplasia/hypoplasia of extremeties
                  meaning: HP:0009815
               2:
                  meaning: HP:0001218

Or define an auto mapping:

enums:
    relationship_code:
          description: Any code in the relations ontology
          values_from: CS:RO
          use: CODE

     disease:
         description: the CURIE of any code in the scary diseases code set
         values_from: NS1:scary_diseases
         use: CURIE

Will have a more complete proposal w/ partially running code available in a few days

@cmungall
Copy link
Member Author

The proposal sounds great

@cmungall
Copy link
Member Author

cmungall commented Feb 9, 2021

can we close this?

@wdduncan
Copy link
Contributor

Perhaps a little late in the game to be commenting on this, but should we use the slot name permissible_values? This seems (to me at least) that only the values listed are permitted. But, "in the wild" you encounter all sorts of values that aren't given the data dictionary. Would it be better to name this slot defined values?

@hsolbrig
Copy link
Contributor

Interesting -- at the moment, the behavior is "permissible". If you list "A", "B, "C" as the three permissible values, "D" will be treated as an error. How do you envision this component working in tandem with "in the wild" data?

@wdduncan
Copy link
Contributor

Suppose, I have a slot foo defined as:

foo:
  - range: string

with permissible values A, B, C. If I encounter value D, in one sense, you are right. I will want to throw an error. However, what about if I am aware that I may not have mappings (i.e, "meanings") for all the values of foo? I don't think I would necessarily want to thrown an error. I may still want to have the data transformed (e.g., foo: D), although I haven't specified D as being "permissible". Adding D as an enum may happen later in time as the schema develops and more information is gathered.

Make sense?

@cmungall
Copy link
Member Author

I think we can close this now? We have separate tickets for specific aspects, e.g. markdowngen, jsonschemagen

@wdduncan
Copy link
Contributor

I think it can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants