Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider alternate mechanisms to define specialized qualifiers in Statement profiles #134

Closed
mbrush opened this issue May 28, 2024 · 29 comments
Assignees

Comments

@mbrush
Copy link
Contributor

mbrush commented May 28, 2024

The current core-im-source draft from Larry and Alex uses a plural label qualifiers but takes a single object as its type:

I assume this plural name is used because of how you implement statement-specific qualifiers in profiles like the Variant Pathogenicity Statement - where the qualifiers field takes a single "untyped" object within which several specific 'qualifying' properties are defined:

I would like to propose alternatives to the naming and nesting of properties in undefined objects as a way to specify profiled qualifier representations.


Alternative 1 (preferred): Directly specialize the core-im Statement.qualifier property in a given Profile schema to define specific qualifier-type properties. So for Variant Pathogenicity, where we want to specify three possible types of qualifiers (penetrance, moi, gene context) - the VariantPathStatement profile would include the following three properties on the VarPAthStatement object that all extend a core-im qualifier property:


VariantPathogenicityStatement:
  maturity: draft
  type: object
  inherits: va.core:Statement
  properties:  
    . . . 
    penetranceQualifier:
      extends: qualifier
      type: string
      enum:
        - high
        - low
        - risk allele
      description: >-
        The extent to which the variant impact is expressed by individuals …

    modeOfInheritanceQualifier:
      extends: qualifier
      type: string
      enum:
        - autosomal dominant
        - autosomal recessive
        - X-linked dominant
        - X-linked recessive
        - mitochondrial
      description: >-
        The pattern of inheritance expected for the pathogenic effects …

    geneContextQualifier:
      extends: qualifier
      $refCurie: gks.genes:Gene
      description: >-
          A gene context that qualifies the Statement.

This is to me a clear, simple, and direct implementation of how I have envisioned qualifier specialization working. At its core is the simple pattern where a profile holds several distinct qualifier properties that are direct properties of the Statement, and not nested in some untyped object. But not sure if there is a technical or application-specific reason this won’t work?

Alternative 2 below was not considered / approved on the 5-29 call, so next step is to wait for Alex' team to draft an implementation of their approach to encoding Alternative 1 in a way the metaschema tooling can handle.

Or, might we consider refining the metaschema code to allow for multiple specializations of a single property - so we could directly adopt Alternative 1 above? If it is just a technical impediment that is preventing this useful feature, perhaps we could change it?


Alternative 2: Another approach that is closer to the current way things are specified, but explicit declares object types, would involve a VariantPathogenicityQualifierSet class in the VarPath profile schema. In this class, we could define the three qualifier properties defined in the current schema within a nested untyped object. This class would extend a generic QualifierSet class we would want to add to the core-im. Then the existing VarPathStatement.qualifiers` property would then simply reference this class as its type, e.g.:

In the core-im, we add:

"Statement":
  heritableProperties:
    "qualifierSet": 
       "type": "#/$defs/ QualifierSet"

"QualifierSet":
   "heritableProperties":
      "qualifiers":
         "type": "array"
         "items": Qualifier

TO DO: finish getting this right 

In the VarPathStatement class:

  qualifierSet: 
      $ref: "#/$defs/ VariantPathogenicityQualifierSet"         

Definition of the VariantPathogenicityQualifierSet class in the varpath profile:

VariantPathogenicityQualifierSet:
  maturity: draft
  type: object
  inherits: va.core:QualifierSet
  description: >-
    Additional, optional properties that qualify a VariantPathogenicity Statement.
  properties:
    penetranceQualifier:
        extends: qualifier
        type: string
        enum:
           - high
           - low
           - risk allele
        description: >-
           The extent to which the variant impact is expressed by individuals …
 
    modeOfInheritanceQualifier:
        extends: qualifier
        type: string
        enum:
           - autosomal dominant
           - autosomal recessive
           - X-linked dominant
           - X-linked recessive
           - mitochondrial
        description: >-
           The pattern of inheritance expected for the pathogenic effects …
 
    geneContextQualifier:
        extends: qualifier
          $refCurie: gks.genes:Gene
          description: >-
              A gene context that qualifies the Statement.

While this is not as simple and clean as the first alternative, it is at least IMO a more clear and explicit way to implement the current approach – in that avoids it what I find to be a confusing use of nested properties and untyped objects. However, it does require defining and profiling an additional core im class (QualifierSet) in order to define a VarPathStatement profile. But again, if we go with Alternative 1 above, which I prefer – we avoid all of this.

@mbrush mbrush changed the title core-im and profiles: Consider alternate mechanisms to specify qualifiers in Statement profiles Consider alternate mechanisms to define specialized qualifiers in Statement profiles May 29, 2024
@larrybabb
Copy link
Contributor

I'm in favor of moving forward with Alternative 2 in lieu of the effort and time needed for Alternative 1. Alternative 2 resolves the main concerns of being able to both tag which qualifiers are required vs optional as well as being able to explicitly define the types and subtypes of qualifiers.

@mbrush
Copy link
Contributor Author

mbrush commented Jun 12, 2024

Looking closer at the Alt 2 proposal - I actually think we run into the same problem here as for Alternative 1 - as we would need to extend a QualifierSet.qualifier core IM property three different times to define the three VarPath qualifiers needed in that profile.

We can see this clearly in the example above - where penetranceQualifier, modeOfInheritanceQualifier, and geneContextQualifier all extend the same QualifierSet.qualifiers property. (Recall, profiling does not allow for defining abritrary new named properties in a profiled class - all named properties in a profile need to be inherited, or extend a core-im property.)

Given this, perhaps we wait for Alex to draft his proposal for implementing Alternative 1 and see what this looks like. He liked the spirit of that alternative, and said that it could be done using existing metaschama functionality.

@mbrush
Copy link
Contributor Author

mbrush commented Jun 12, 2024

I know I am jumping the gun here given that we haven't seen Alex' propsoal for implementing Alternative 1 - but is it crazy to consider extending the metashchema functionality to allow for what we need to implement Alternative 1 directly? i.e. the ability to specialize one core im property into three different profile properties? Just like a class or a property in an ontology can have multiple subtypes? This would directly support implementing Alternative 1 as above, and seems like generally useful functionality for metaschema tooling to support.

I am not really qualified to propose such things, but naively it seems like this would just require a new keyword to use instead of extends that would support this functionality - maybe call it multiplies? The metaschema code could be updated to know that when it sees this multiplies keyword, it will be deriving multiple new properties in a profile that all inherit the characteristics of their parent core-im property, but overwrite these with any new constraints defined in the 'multiplied' property.

A profile schema based on this approach would be very clean and clear and easy for developers to create. For defining the three qualifiers in the VariantPathogenicityStatement, it would look something like this:

VariantPathogenicityStatement:
  maturity: draft
  type: object
  inherits: va.core:Statement
  properties:  
    subject: ...
    predicate: ...
    object: ...
    . . . 
    penetranceQualifier:
      multiplies: qualifier
      type: string
      enum:
        - high
        - low
        - risk allele
      description: >-
        The extent to which the variant impact is expressed by individuals …

    modeOfInheritanceQualifier:
      multiplies: qualifier
      type: string
      enum:
        - autosomal dominant
        - autosomal recessive
        - X-linked dominant
        - X-linked recessive
        - mitochondrial
      description: >-
        The pattern of inheritance expected for the pathogenic effects …

    geneContextQualifier:
      multiplies: qualifier
      $refCurie: gks.genes:Gene
      description: >-
          A gene context that qualifies the Statement.

Of course, this 'multiplies' functionality could also be used to implement Alterntive2 if we like that approach better. As noted above, this alternative would also need to specialize one core im property into three profile properties.

@mbrush
Copy link
Contributor Author

mbrush commented Jun 20, 2024

UPDATES:

5-29-24 call:

  • Alex and Javi approved of the simplicity/directness of Alternative 1 - including the idea that all qualifier properties would be captured at the same level as the core s-p-o properties that together express the Statements core proposition.
    • But Alex clarified that the metaschama 'extends' functionality cannot implement Alternative 1 exactly as shown above - as it cannot 'override' a single core-im property (e.g. qualifier) multiple times to create several derived/specialized qualifier properties in a Profile (e.g. penetranceQualifier, geneContextQualifier, etc).
    • However, he did indicate that they could implement the spirit of this proposal in a different way. His team will explore / demonstrate how this could work.
  • Matt pointed out that another consideration that favors Alternative 1 is that there are some qualifiers that are required and critical components of a given statement type (e.g. the disease qualifier for a therapeutic-response statement), and others that are not required but can optionally provide additional detail or context to a statement (e.g. the alleleOrigin qualifier for a therapeutic-response statement).
    • Alex would like these essential qualifiers to be emphasized by being shown at the same level as the core s-p-o properties (which is achieved by Alternative 1), and not nested down a level in a Qualifiers object (which would happen in Alternative 2).

6-12-24 call:

  • Larry seemed to support Alternative 1 as the best long term solution, but pushed for short term implementation of Alternative 2 as a concrete improvement that would require minimal change/effort - given that Alternative 1 may require some developer time.
  • Matt still factors Alternative 1, but agrees Alternative 2 is a simple short term improvement if that is all that we can do right now.
    • He reiterated the fact that defining qualifier specializations in yaml files is something that community modelers will need to be able to understand and do for themselves. So it needs to remain clear and simple to understand and implement.
    • This conceptual clarity and ease of implementation was one of the main drivers for Alternative 1 proposed above. As was the clarity of the connection between the profiled qualifiers and the core-im qualifier property.
    • Whatever solution Alex's team comes up with to implement this approach should be clear and simple to implement, and not obscure the connection between the core-im and profiled qualifiers.

@larrybabb
Copy link
Contributor

I have a new proposal.

Why do we even need to put the qualifiers attribute in the Statement yaml at all? I'm not saying that it isn't important, but the reality is that if there isn't a qualifier needed on a given Statement profile then we wouldn't even want the attribute to begin with. And, when there are qualifier(s) needed they are specialized to be their own properties. Thus, trying to define some generic placeholder seems like a fools errand as it has no value. The real value is finding a way to convey to Profilers and Abstractionists that the concept is valuable and important to the SPOQ design and while the SPO elements are fundamental the Q elements may or may not be given a specific profile.

Can we remove qualifiers as a formal attribute from Statement and find a way to show a Concept Attribute called Qualifier in its place. This would allow us to be super transparent and clear that the concept is fundamental yet too abstract to define until Profiling takes place.

This would allow us to avoid future-proofing abstract classes and instead directing folks on the standard way to qualify statements.

@mbrush
Copy link
Contributor Author

mbrush commented Jun 21, 2024

Larry - can you explain further what you mean by “ find a way to show a Concept Attribute called Qualifier in its place. ”?

@larrybabb
Copy link
Contributor

After more discussion we are going to make changes to the metaschema processor to support approach 1.

@ahwagner
Copy link
Member

@larrybabb and @mbrush is there a recording that documents why we will be investing the effort to make this change in the near-term?

@mbrush
Copy link
Contributor Author

mbrush commented Jun 26, 2024

@ahwagner I don't think the discussion was recorded, but I will summarize here. To be clear, the solution that Larry and I decided we prefer was implementing Alternative 1 by extending the metaschema code with a new keyword and functionality that allows for specializing a core-im property into multiple sub-properties when profiling. Details and benefits are described in the comment above. If I recall you liked the spirit of this proposal, and main objection was that the current metaschema processor doesn't have a function to support specializing one core property into several sub-properties in a profile. Larry and I say lets just add this (seemingly straightforward) functionality, rather than try to define work arounds.

Our rationale:

  • we generally agree that this new functionality would it make qualifier specialization clean, clear, and easy for profile developers to define in the yaml (see yaml example above), then if we can add it without too much effort - so lets just do it now rather than spending time defining and implementing a short term fix that is not ideal.
  • We noted that the utility of this functionality is not limited Statement qualifiers, as there are analogous use cases for wanting to specialize a single core-im property into many sub-properties (e.g. the need to multiply the core-im `StudyResult.dataItems' property into several data type specific properties in the caf profile here). This too becomes very concise, clear, and easy to define in the yaml with a 1:many specialization capability.
  • One of the challenges to selling our approach will be ensuring profile developers and users can understand and easily create yaml profile definitions - so anything we can do to make metaschema-based specification of these more straightforward is a win.

Of course this is all dependent on your approval willingness to devote developer time to making this enhancement. Larry estimated adding code to handle 1:m specialization would be ~a days work, but obviously you would know better here.

@ahwagner
Copy link
Member

Sorry about the delay in a response here, I'm really bogged down in grant submissions and travel at the moment.

FWIW, the limitation here isn't technical; it would be easy enough to implement. The limitation is about breaking conventions. Extends has roughly meant "replaces the parent property with an extended version". It is not specific to VA-spec or the VA-spec core IM, it is a generic operation that works across products. This operation is different than "inherits from an abstract property", which is the behavior being described in this thread. I'm opposed to changing the definition and behavior of extends (or any keyword) to suit a specific use case, and not sufficiently motivated by the argument that it is easier for implementers to reuse extends in the same way as done for the subject and object properties of some core IM classes.

What I was going to propose was simply creating a Qualifier class and using the JSON Schema allOf keyword, that is used to address this specific type of situation, e.g.:

VariantPathogenicityStatement:
  maturity: draft
  type: object
  inherits: va.core:Statement
  properties:
    penetrance:
      description: >-
        The extent to which the variant impact is expressed by individuals …
      allOf:
       - $refCurie: va.core:Qualifier
       - type: string
          enum:
            - high
            - low
            - risk allele
    . . .

An alternative approach is to add functionality for another another keyword (@mbrush suggested multiples), and I think inherits (at the property level) would make sense here. But my preference would be to use the JSON Schema allOf approach first, since it would require no further development work and leverage standard patterns in JSON Schema.

@mbrush
Copy link
Contributor Author

mbrush commented Jun 27, 2024

@ahwagner - no worries, I know you are busy! To be clear, we are not proposing to change the definition or behavior of the extends keyword - per the first part of your response. 'Extends' stays as is, and is used for 1:1 property extension/replacement.

Our proposal is what you reference briefly at the very end of your response - to create an new keyword that is used specifically when a profiler wants to do 1:m extension/replacement of a property. For example, specialize the Statement.qualifiers property into modeOfInheritance and geneContext. Or specialize the StudyResult.dataItem property into focusAlleleCount and locusAlleleCount. Glad to hear that this approach makes good sense to you, and would be technically simple to implement!

I get the rationale behind your proposal as well - but was hoping you could flesh it out a bit by illustrating what this 'Qualifier' class-based proposal looks like in both the core-im yaml as well as a derived profile? I tried working this up myself but wasn't sure what you had in mind.


I also wanted to note one potential issue with your proposal concerning its creation of a new de novo property in the VarPath profile (penetrance) that does 'extend' an existing core-im property. This violates our established Profiling rules - which require any 'new' property added to a profile to specialize/extend an existing core-im property (if it does not, it needs to be created using the Extension mechanism). Of course, in your example you could declare penetrance to 'extend' the core-im Statement.qyualifier property - but there are other qualifier properties in this profile that would also need to extend qualifier - which as you say is not allowed. This is of course where the 'multiplies' keyword would help.


Finally, I wanted to make sure it was clear that the qualifier example is not the only use case for wanting to perform 1:m property specialization/extension. For example, I think it would also come into play to support specific data type properties created in StudyResult profiles (e.g. focusAlleleCount and locusAlleleCount in the CAF profile). So there is more general utility to adopting a 'multiplies' like functionality.

@ahwagner
Copy link
Member

Our proposal is what you reference briefly at the very end of your response - to create an new keyword that is used specifically when a profiler wants to do 1:m extension/replacement of a property.

Hey @mbrush; just to clarify, this is not what I meant. Yes, a new keyword is possible (though again, I would prefer to use existing JSON Schema conventions); and no, this is not a proposal for a 1:m replacement of a property. To my knowledge, 1:m property replacement is not a pattern that is used in JSON Schema, pydantic, Active Model, or any other framework we use for modeling data in VICC resources. It might help me understand the importance of this pattern to see it applied in other data modeling languages. I am not aware of a 1:m property/slot replacement mechanism in LinkML, either; from what I understand (having spent very little time with this particular language), what I am proposing is most similar to the LinkML slot_URI property.

@larrybabb
Copy link
Contributor

@mbrush I'd like to review the following statement with you...

I also wanted to note one potential issue with your proposal concerning its creation of a new de novo property in the VarPath profile (penetrance) that does 'extend' an existing core-im property. This violates our established Profiling rules - which require any 'new' property added to a profile to specialize/extend an existing core-im property (if it does not, it needs to be created using the Extension mechanism).

With @ahwagner's solution we would only be adding new attributes to the Standard Profiles not implementation specific profiles. I think we may be raising the bar much too high to say that no attributes beyond what are in the core-im classes can be added. I think by allowing the standard profiles to add the Qualifiers and DataItems needed for specific standard profile types we are finding the appropriate compromise and not contriving a special mechanism.

Frankly, if implementers choose to add attributes we cannot stop them anyway. But, with this compromise we can still relegate the standard qualifier attributes and dataitems for statements and studyresults as needed. If implementers go off course, that is the choice they have to make. We will clarify the risks for doing so, but it is not our job to be overly restrictive. We want to allow implementers to experiment and succeed and fail at their own behest.

@larrybabb
Copy link
Contributor

@mbrush I'd like to propose we try @ahwagner's approach so we can see it in action. I will apply it to the CohortAlleleFrequency and VariantPathogenicity standard and implementer subprofiles. We can always improve upon it later. This way we can all clearly see the impact before we try to make the final-final decision.

I will proceed with this and continue watching for any comments from you.

@mbrush
Copy link
Contributor Author

mbrush commented Jun 29, 2024

More to say on this next week - but for now @ahwagner can you add to your the partial example above what the Qualifier class would look like in the core-im yaml - to help me appreciate how the profiled qualifiers in your example above relate to the core-im model? The example above only shows what your proposal looks like in a Profile.

Also, I didn’t follow how what you are proposing is "most similar to the LinkML slot_URI property". If you think it is important that I do - can you explain in more detail? Thanks!

@larrybabb
Copy link
Contributor

@mbrush again deferring to @ahwagner for the final word to your request immediately above..

My best guess is that the Qualifier class would resemble the DataItem class. You had pointed out the similar pattern between needing to treat DataItems off of StudyResults the same way Qualifiers should be treated off of Statements. Ideally we have a similar solution and design pattern.

@larrybabb
Copy link
Contributor

@mbrush and @ahwagner ... I played around with various options for several hours/days to find a good jumping off point so that we can move forward. Here's where I currently have landed (and all of this is in the newly refactored schemas as of Jul 2, 2024).

  • We should NOT be creating a qualifiers array property at the abstract Statement level since we do not want a qualifiers array in the profile subclasses.
  • Since we DO want explicit xxxQualifier properties on profiles, we should simply define these explicitly in the standard profiles themselves. (Yes, Matt, I think it is necessary to be able to add new attributes off of core-im classes, especially those that are abstract. The standard concrete profiles is the place where everything comes together.)
  • I went back and forth creating a Qualifier class in the va.core namespace and decided it created unnecessary complexity with little to no benefit. (I left a commented out version of this class in va.core's core-im-source.yaml at the top).
  • I landed on simplicity and simply named the qualifier properties on the various standard profiles using the convention xxxQualifier where xxx was the original name we proposed for each qualifier property. I do not think we need to by typed or made into objects unless needed by the type of the data itself. (result: simple data type qualifiers like integer qualifiers can be very straightforward).

I hope this is a good next step an allows us to move forward with more implementation work.
Please review, comment and close this out if possible.

@larrybabb larrybabb removed their assignment Jul 3, 2024
@larrybabb larrybabb moved this from Todo to In Progress in Preparing VA-Spec for Plenary 2024 Jul 3, 2024
@larrybabb larrybabb moved this from In Progress to Done in Preparing VA-Spec for Plenary 2024 Jul 3, 2024
@larrybabb larrybabb moved this from Done to In Progress in Preparing VA-Spec for Plenary 2024 Jul 3, 2024
@larrybabb larrybabb moved this from In Progress to Review in Preparing VA-Spec for Plenary 2024 Jul 3, 2024
@larrybabb larrybabb reopened this Jul 3, 2024
@larrybabb
Copy link
Contributor

accidentally closed this. sorry

@mbrush
Copy link
Contributor Author

mbrush commented Jul 4, 2024

@larrybabb I agree with your assessment here. It aligns with where I ended up on this question, and proposes the same short term solution I had in mind.

The only point I don’t agree about is that qualifiers in profiles (e.g. modeOfInheritanceQualifier, geneContexQualifier) are truly 'new' properties. Conceptually/semantically, these qualifiers specialize the Statement.qualifier property - similar to how other profiled properties extend core im properties (e.g.VarPathStatement.subjectVariant, or VarPathStatement.predicate). Given this - your plea above to allow for adding 'new' properties in Standard Profiles is moot - because the properties you are wanting to allow for are not considered 'new' - and thus do not violate current Profiling rules.

Finally, I do view this proposal as a step on the path toward us being able to explicitly indicate in the yaml files with some kind of flag or keyword that 1:m property profiling is conceptually happening in the yaml/metaschema-based profiling process. The processer doesn't have to do anything formal with this keyword in executing the transform to json schema - it is just there to indicate that property specialization is happening, and keep profilers honest. This is really the only other thing I would want to add at some point to your proposal Larry . . . more on this in my next comment.


UPDATE: I noticed that you removed the Statement.qualifiers property from the core-im. I created PR #160 to add this property back, as I think defining qualifier in the core-im is important for two reasons: (1) it is useful to explain what the notion of a ‘qualifier’ means and what it is used for in the modeling framework; and (2) it is necessary to achieve the conceptual specialization I describe above, when a specific qualifier property is defined in a Statement profile. Again, this is central to the idea/rules around profiling.

In reinstating the qualifier property, I provided a more informative and accurate description of its meaning and utility - which included removing the phrase suggesting that qualifiers are always optional properties - as this is not true of profiled qualifiers in many profiles.

I also do think that the proposed 'multiplies' keyword does need to have a simple metaschema function associated with it - namely to remove the 'multiplied' core-im qualifier property (as it gets replaced by the specific qualifier properties defined in a profile).

@mbrush
Copy link
Contributor Author

mbrush commented Jul 4, 2024

To follow up on something Alex said earlier:

"To my knowledge, 1:m property replacement is not a pattern that is used in JSON Schema, pydantic, Active Model, or any other framework we use for modeling data in VICC resources. . . . I am not aware of a 1:m property/slot replacement mechanism in LinkML, either"

I don't think of what is happening with qualifiers in terms of a formal 1:m property "replacement" pattern. Conceptually, I just want to be able to indicate that we are defining multiple sub-properties of the core-im Statement.qualifier property in a Statement Profile - and using these sub-properties to collect different types of qualifying information. It is this type of 1:m property specialization that I am saying is common practice in many modeling frameworks (e.g. you can find many examples of a property having >1 subproperty in LinkML models such as Biolink, and in OBO ontologies used to generate Linked Open Data).

If JSON schema language or tools don’t formally support the idea of 1:m property specialization, that is fine. This parent/child property link doesn’t need to be formally specified/implemented in the final json schema profiles that are generated by metaschema. I just want to be able to see specialized properties in a profile and not see the generic qualifier property from the core-im - and indicate in the yaml-based specification (and the derived documentation) that this property specialization relationship exists so it is clear in the profiling process that modelers are not creating a truly 'new' property when they define specific qualifiers. Having a metaschema keyword to explicitly indicate this will ensure that modelers consider the "no new properties" rule, and that we can validate that this rule is followed (by checking that all properties in a profile have an extends or multiplies keyword).

IMO the proposed multiplies keyword lets us do this in the most clear and direct way - as a simple flag in the yaml to say "property specialization is happening here, we are following the rules!". I don't think we need to worry about the simple metaprocessor functionality associated with this keyword (removing the 'multiplied' core-im qualifier property) conforming to json schema norms - as it is being executed outside this context, and the final schema produced is entirely compliant with the json schema language.

mbrush added a commit that referenced this issue Jul 4, 2024
Here I update StudyResult.dataItem property to align with analogous paradigm for qualifier specialization


In #134 @larrybabb  proposed that profiles can define specific qualifiers as ‘new’ properties as long as they are named as qualifiers.  @mbrush clarified that these are not truly ‘new’ – as conceptually they specialize the core-im `Statement.qualifier` property. 

The use of `StudyResult.dataItem` property in StudyResult profiles is quite analogous, in that profiling often requires defining >1 specialization of this property for different specific data types. 

In this PR, we update the definition of StudyResult.dataItem in the core-im to support this paradigm and align with how `Statement.qualifier` is defined in PR #___

Specifically: 
- removed the existing `dataItems` property
- replaced it with a ‘dataItem’ property (singular, consistent with it not taking an array in the core-im)
- made it take an ‘object’  (which by my understanding covers strings)
- gave it a clear and informative free-text description
- added a ‘comment’ that explains how this property can get profiled into >1 more specific properties. 

In the future we might consider some type of flag/keyword on this core-im property to explicitly mark it as amenable to 1:m specialization in profiles – if this helps with conceptual clarity or computational validation.
@ahwagner
Copy link
Member

ahwagner commented Jul 9, 2024

I don't think of what is happening with qualifiers in terms of a formal 1:m property "replacement" pattern. Conceptually, I just want to be able to indicate that we are defining multiple sub-properties of the core-im Statement.qualifier property in a Statement Profile - and using these sub-properties to collect different types of qualifying information. It is this type of 1:m property specialization that I am saying is common practice in many modeling frameworks (e.g. you can find many examples of a property having >1 subproperty in LinkML models such as Biolink, and in OBO ontologies used to generate Linked Open Data).

Would you kindly link some of these examples here, so I can gain some context on what patterns from these other frameworks you are looking to replicate? I was unable to find an example of a multiples or analogous pattern in LinkML when I looked, so could use your help to find that and anchor this discussion.

@mbrush
Copy link
Contributor Author

mbrush commented Jul 9, 2024

Hi Alex. Apologies if the text above wasn't clear about this - I am not saying that the linkML or OBO ontology frameworks come with code that supports a 'multiplies'-like function for transforming models when defining profiles (although linkML is interested in developing direct support for a profiling paradigm like the one we are implementing in VA, which will likely include such a function).

I am saying that the underlying notion of a property having >1 sub-property ('1:m property specialization') is a pattern found in these and other modeling paradigms. e.g. we see cases in the Biolink Model (the founding LinkML data model) where a slots (aka property) has > 1 sub-property specializations (see slot inheritance hierarchies for the slots here and here).

The ability to specify 1:m property specialization is IMO a foundational requirement for the VA/SEPIO profiling approach. I would like our framework and tooling to explicitly support this idea as a part of the profiling process - where a given property (e.g. 'qualifier' in the core-im) can have >1 specialized sub-properties defined in a profile that replace the abstract parent. The most simple and direct way I envision doing this is a flag or keyword we can use in the yaml to indicate where this type of property specialization happens (and to trigger code that simply removes the abstract parent property from profiles (e.g. remove 'qualifier' in profiles where specializations of this property are defined). Analogous to what you have already implemented with 'extends', but for 1:m specialization instead of 1:1 extension.

Hopefully that helps, and the rest of my rationale for this above is clear.

@ahwagner
Copy link
Member

ahwagner commented Jul 9, 2024

Okay, I'm getting it. The LinkML specializations linked above are property-level inheritance, implemented with is_a; this is a different pattern than what is being proposed with multiplies, but with these links I understand what you are looking to emulate. I think the most productive next step would be to draft up a schema that uses existing patterns in JSON Schema and Metaschema processor that meets the goals of the profiling approach.

@ahwagner
Copy link
Member

In thinking about this more, we should step back use of extends to replace property names; it is also an anti-pattern and I think this conversation has illuminated how extension of this principle starts breaking traditional modeling paradigms in bad ways.

Speaking with my implementer's hat on, I do not see a need to overcomplicate this with a bunch of new patterns for defining profiles. I disagree that a syntax and translation function for 1:m specialization is a foundational requirement for VA-spec. What I want from VA-Spec is the evidence->evidence line->assertion model; if I can realize this without such a function, how is that a foundational requirement?

I think the idea is that profiling involves selecting a set of valid properties that may be used; but in JSON Schema, you can easily pare down / profile properties using allOf, as we did in the gnomAD CAF implementation profile. So I'm not seeing why we need to invent new patterns or tools to do this.

Following on @larrybabb's proposal, here is a very simple schema that is easy to implement and ensures that all properties adhere to the naming conventions @larrybabb has proposed. To be clear, I'm not a fan of these naming conventions; they are atypical and AFAIK are not driven by any implementation requirements. However, we could simply implement this in the abstract Statement class and the semantics are implied for all downstream classes. Quick and easy.

A more complicated approach does what @larrybabb described as more work for little additional gain: inherit semantics from a parent core-IM class for S, P, O, or Q and implement them in profiles. Here is a simplified example of what that might look like. One thing that I like about this approach is that since the SPOQ relations are explicit, it removes any need for these property types to be encoded in the property names.

@mbrush
Copy link
Contributor Author

mbrush commented Jul 10, 2024

@ahwagner what would help me to understand and assess your proposal for qualifier specification would be an end-to-end view of what the various modeling artifacts would look like (the core-im yaml, a derived VarPathStatement profile yaml, and final the json-schema generated by the processer). This will let us fully appreciate/assess how it addresses criteria that are important to us (adherence to VA profiling rules, adherence to traditional modeling paradigms, clarity/ease of profile authoring for modelers, etc).

re:

I disagree that a syntax and translation function for 1:m specialization is a foundational requirement for VA-spec. What I want from VA-Spec is the evidence->evidence line->assertion model; if I can realize this without such a function, how is that a foundational requirement?

1:m property specialization is core to how the SPOQ model in the core-im is specialized into the SPOQ model for a profile like the VariantPathogenicityStatement profile - which has 3 qualifier properties. Specialization of the core-im qualifier property into three subproperties (geneContextQualifier, modeOfInheritanceQualifier, and penetranceQualifier) happens in this profiling process.

I don't know that we definitively need a formal "syntax and translation function" - but IMO a foundational requirement of the support we provide for profile modelers is the ability to declare in their yaml that this kind of specialization is happening - to facilitate adherence to Profiling Rules that say any 'new' properties defined in a standard profile must be specializations/extensions of existing properties in the core-im.

If there are ways to achieve this using 'traditional' modeling paradigms', that also make it easy for modelers to understand and draft yaml-based profile documents, I would be happy to go this route. Just need to understand how this would look in practice.

@larrybabb
Copy link
Contributor

larrybabb commented Jul 10, 2024

@mbrush if i may interject. I get that geneContextQualifier, modeOfInheritanceQualifier and penetranceQualifier are all subtypes of qualifier. I don't think that is so much of an issue (but I could be wrong). The real challenge is when a property is assigned to an abstract class only that property may be specialized.

When I look at some of the LinkML examples of property inheritance like association_slot (see here) it has many concrete subtypes. But it's only the subtypes that get applied to a given class. I don't see anywhere in the LinkML Association domain where the association_slot abstract property gets placed as a property within another class (abstract or not).

So, if this is what is needed we would need to find a different mechanism in our approach and tooling to building profiles which didn't create a scenarios that had us place an abstract property like qualifier in an abstract class like Statement. For example, a qualifier_slot which is just an abstract property that belongs to the Statement domain.

I'm not sure if we should explore defining a way to define abstract property types (slots?) and stick with the approach that all (including LinkML) use to only apply the concrete property types to the appropriate class that will designate that that property will always be available in that class and all of it's descendants.

I don't think we can support property inheritance to a property that has already been assigned to a class. In those situations we can simply extend and constrain it further but not expand or change it fundamentally. FHIR is a good example of this, although I don't think FHIR has the notion of property inheritance (but I haven't reviewed it lately).

@ahwagner
Copy link
Member

@ahwagner what would help me to understand and assess your proposal for qualifier specification would be an end-to-end view of what the various modeling artifacts would look like (the core-im yaml, a derived VarPathStatement profile yaml, and final the json-schema generated by the processer). This will let us fully appreciate/assess how it addresses criteria that are important to us (adherence to VA profiling rules, adherence to traditional modeling paradigms, clarity/ease of profile authoring for modelers, etc).

Understood. I put together a draft PR (#171) to illustrate what I think captures what you are looking for while preserving typical inheritance patterns. It injects slot definitions and the is_a form in a format compatible with LinkML, which will make that transition easier if/when we pursue that.

@larrybabb
Copy link
Contributor

@ahwagner very creative and a great way to provide a pathway between the linkML (and ontological) property inheritance capabilities and still works with json and jsonschema. And bonus points for not having to craft additional tooling.

@ahwagner
Copy link
Member

ahwagner commented Aug 8, 2024

Per discussions held on 8/6 and 8/7 with @mbrush and @larrybabb, we will not be incorporating LinkML features (e.g. slot inheritance) at this time, as this represents a substantial risk to target timelines for VA-Spec deliverables. We will incorporate SPOQ semantics as part of variable names as a compromise, in recognition of the SEPIO conceptual model. Continued work on the SEPIO conceptual design and domain-agnostic framework will be disentangled from VA Spec and maintained in a separate repository by the SEPIO developers.

Closing this issue for now. @mbrush or @larrybabb please reopen if this is still required for VA Spec 1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants