Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend enum model to allow open vs closed, extensional vs intensional #127

Open
wdduncan opened this issue Feb 26, 2021 · 7 comments
Open
Labels
enhancement New feature or request

Comments

@wdduncan
Copy link
Contributor

wdduncan commented Feb 26, 2021

From LinkML meeting on 02/26

For emums, I proposed to have a slot named defined_value that is a part of permissible_value. This would allow for data translation cases in which you could only define some of the values, but don't want to generate an exception. permissible_value would be used when you want to enforce the schema.

Related to #37

cc @cmungall @hrshdhgd @sierra-moxon @deepakunni3 @hsolbrig

@hsolbrig hsolbrig transferred this issue from biolink/biolinkml Mar 26, 2021
@cmungall
Copy link
Member

cmungall commented Oct 4, 2021

can you clarify this @wdduncan not sure I understand

@wdduncan
Copy link
Contributor Author

My reason for suggesting this was to allow for values that were not specified in the enum. Consider an enum like:

GenderEnum:
  permissable values:
    male: 
       meaning: http://xyz.org
    female:
       meaning: http://xyz.org

If you encounter a case in which a gender data field has value "other", do you always want to throw an error? Or (in some cases) do you want to accept the data, although you haven't specified the value in the enum. permissible value is for the former case, defined value would be for the latter case.

@cmungall
Copy link
Member

cmungall commented Oct 15, 2021

Proposal:

Add a slot to EnumDefinition with a name such as is_open

  • If the enum is closed, then values MUST belong to the set of permissible values
  • If the enum is open, then values SHOULD (MAY?) belong to the set of permissible values
  • If the enum is closed, and there are no permissible_values, then this is a schema error (detected when we convert to json schema, but we should have first-class checks)

We should default to closed enums as that is the current semantics (unless codeset is specified - see below)

This satisfies the use case above. A schema designer can include mappings in an enum, e.g for M, F, and these map to IRIs, but data providers MAY provide data with other values that do not map, and this is still valid. If you use a string from the enum then you are committing to that meaning.

Note that for open enums, the mapping to json-schema is simply to use a string rather than json-schema enum.

TBD: In future we will support dynamically obtaining the PVs from an externally defined codeset, e.g. via API/terminology service. If a codeset is provided then when generating json-schema, either this should be treated out of band and modeled as a string OR we can imaging querying the service to obtain the json-schema enum at generation time.

@gaurav
Copy link
Contributor

gaurav commented Oct 15, 2021

Alternate proposal: we create two separate enumeration classes in LinkML based on extensional and intensional definitions:

  • ExtensionalEnumeration: needs to explicitly list every permissible value.
  • IntensionalEnumeration: needs to provide some sort of (human or machine) readable definition for what values are permitted, with optional defined value entries to clarify meaning.

@cmungall
Copy link
Member

cmungall commented Oct 22, 2021

Need to define a standard way of defining an intensional Enum. This could involve things such as:

@joeflack4
Copy link
Contributor

joeflack4 commented Oct 22, 2021

@cmungall Just wanted to give you more information based on what you were asking me about TCCM (Terminology Core Common Model).

I dc'ed for about 10 minutes there so I didn't hear the initial discussion, but I think you were asking if TCCM had any "enum class" that would help with this situation. Again, unfortunately I never got onboarded to TCCM by Harold or Dazhi, and there isn't any proper documentation, just some autogenerated documentation that doesn't really contain any comments or descriptions.

That being said, here's a link to the model definition. It's pretty light-weight:
https://github.com/HOT-Ecosystem/tccm-model/blob/main/tccm_model/model/schema/tccm_model.yaml

It doesn't really say anything about "enums". It defines other classes/slots, and they have the property of being multivalued, e.g.:

slots:
  code:
    range: string
    required: true
    description: |-
      The official code of this entry
    slot_uri: skos:notation

  designation:
    description: |-
      The preferred label or text in the context of a particular community or language
    notes:
      - Designation should never be used as an identifier.  They are strictly informative
    range: string
    slot_uri: skos:prefLabel
    multivalued: false   # <---- closest thing in the TCCM model related to "enums"
    required: false

I looked through the codebase otherwise, and there's really nothing in the codebase about "enums" either. Just some imports from linkml.runtime that aren't actually being used:

Screen Shot 2021-10-22 at 5 03 42 PM

@cmungall cmungall changed the title use defined_value as parent of permissible_value Extend enum model to allow open vs closed, extensional vs intensional Nov 12, 2021
@cmungall cmungall assigned hsolbrig and unassigned wdduncan Nov 12, 2021
@cmungall cmungall added this to HS Nov 17, 2021
@cmungall cmungall moved this to Todo in HS Nov 17, 2021
@nlharris nlharris added the enhancement New feature or request label Oct 14, 2022
@turbomam
Copy link
Contributor

There has been a lot of progress on this.

Does the issue need to remain open?

Who should be the assignee?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

7 participants