-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider alternate mechanisms to define specialized qualifiers in Statement profiles #134
Comments
I'm in favor of moving forward with Alternative 2 in lieu of the effort and time needed for Alternative 1. Alternative 2 resolves the main concerns of being able to both tag which qualifiers are required vs optional as well as being able to explicitly define the types and subtypes of qualifiers. |
Looking closer at the Alt 2 proposal - I actually think we run into the same problem here as for Alternative 1 - as we would need to extend a We can see this clearly in the example above - where Given this, perhaps we wait for Alex to draft his proposal for implementing Alternative 1 and see what this looks like. He liked the spirit of that alternative, and said that it could be done using existing metaschama functionality. |
I know I am jumping the gun here given that we haven't seen Alex' propsoal for implementing Alternative 1 - but is it crazy to consider extending the metashchema functionality to allow for what we need to implement Alternative 1 directly? i.e. the ability to specialize one core im property into three different profile properties? Just like a class or a property in an ontology can have multiple subtypes? This would directly support implementing Alternative 1 as above, and seems like generally useful functionality for metaschema tooling to support. I am not really qualified to propose such things, but naively it seems like this would just require a new keyword to use instead of A profile schema based on this approach would be very clean and clear and easy for developers to create. For defining the three qualifiers in the VariantPathogenicityStatement, it would look something like this:
Of course, this 'multiplies' functionality could also be used to implement Alterntive2 if we like that approach better. As noted above, this alternative would also need to specialize one core im property into three profile properties. |
UPDATES: 5-29-24 call:
6-12-24 call:
|
I have a new proposal. Why do we even need to put the Can we remove This would allow us to avoid future-proofing abstract classes and instead directing folks on the standard way to qualify statements. |
Larry - can you explain further what you mean by “ find a way to show a Concept Attribute called Qualifier in its place. ”? |
After more discussion we are going to make changes to the metaschema processor to support approach 1. |
@larrybabb and @mbrush is there a recording that documents why we will be investing the effort to make this change in the near-term? |
@ahwagner I don't think the discussion was recorded, but I will summarize here. To be clear, the solution that Larry and I decided we prefer was implementing Alternative 1 by extending the metaschema code with a new keyword and functionality that allows for specializing a core-im property into multiple sub-properties when profiling. Details and benefits are described in the comment above. If I recall you liked the spirit of this proposal, and main objection was that the current metaschema processor doesn't have a function to support specializing one core property into several sub-properties in a profile. Larry and I say lets just add this (seemingly straightforward) functionality, rather than try to define work arounds. Our rationale:
Of course this is all dependent on your approval willingness to devote developer time to making this enhancement. Larry estimated adding code to handle 1:m specialization would be ~a days work, but obviously you would know better here. |
Sorry about the delay in a response here, I'm really bogged down in grant submissions and travel at the moment. FWIW, the limitation here isn't technical; it would be easy enough to implement. The limitation is about breaking conventions. What I was going to propose was simply creating a
An alternative approach is to add functionality for another another keyword (@mbrush suggested |
@ahwagner - no worries, I know you are busy! To be clear, we are not proposing to change the definition or behavior of the Our proposal is what you reference briefly at the very end of your response - to create an new keyword that is used specifically when a profiler wants to do 1:m extension/replacement of a property. For example, specialize the I get the rationale behind your proposal as well - but was hoping you could flesh it out a bit by illustrating what this 'Qualifier' class-based proposal looks like in both the core-im yaml as well as a derived profile? I tried working this up myself but wasn't sure what you had in mind. I also wanted to note one potential issue with your proposal concerning its creation of a new de novo property in the VarPath profile ( Finally, I wanted to make sure it was clear that the qualifier example is not the only use case for wanting to perform 1:m property specialization/extension. For example, I think it would also come into play to support specific data type properties created in StudyResult profiles (e.g. |
Hey @mbrush; just to clarify, this is not what I meant. Yes, a new keyword is possible (though again, I would prefer to use existing JSON Schema conventions); and no, this is not a proposal for a 1:m replacement of a property. To my knowledge, 1:m property replacement is not a pattern that is used in JSON Schema, pydantic, Active Model, or any other framework we use for modeling data in VICC resources. It might help me understand the importance of this pattern to see it applied in other data modeling languages. I am not aware of a 1:m property/slot replacement mechanism in LinkML, either; from what I understand (having spent very little time with this particular language), what I am proposing is most similar to the LinkML |
@mbrush I'd like to review the following statement with you...
With @ahwagner's solution we would only be adding new attributes to the Standard Profiles not implementation specific profiles. I think we may be raising the bar much too high to say that no attributes beyond what are in the core-im classes can be added. I think by allowing the standard profiles to add the Qualifiers and DataItems needed for specific standard profile types we are finding the appropriate compromise and not contriving a special mechanism. Frankly, if implementers choose to add attributes we cannot stop them anyway. But, with this compromise we can still relegate the standard qualifier attributes and dataitems for statements and studyresults as needed. If implementers go off course, that is the choice they have to make. We will clarify the risks for doing so, but it is not our job to be overly restrictive. We want to allow implementers to experiment and succeed and fail at their own behest. |
@mbrush I'd like to propose we try @ahwagner's approach so we can see it in action. I will apply it to the CohortAlleleFrequency and VariantPathogenicity standard and implementer subprofiles. We can always improve upon it later. This way we can all clearly see the impact before we try to make the final-final decision. I will proceed with this and continue watching for any comments from you. |
More to say on this next week - but for now @ahwagner can you add to your the partial example above what the Also, I didn’t follow how what you are proposing is "most similar to the LinkML slot_URI property". If you think it is important that I do - can you explain in more detail? Thanks! |
@mbrush again deferring to @ahwagner for the final word to your request immediately above.. My best guess is that the Qualifier class would resemble the DataItem class. You had pointed out the similar pattern between needing to treat DataItems off of StudyResults the same way Qualifiers should be treated off of Statements. Ideally we have a similar solution and design pattern. |
@mbrush and @ahwagner ... I played around with various options for several hours/days to find a good jumping off point so that we can move forward. Here's where I currently have landed (and all of this is in the newly refactored schemas as of Jul 2, 2024).
I hope this is a good next step an allows us to move forward with more implementation work. |
accidentally closed this. sorry |
@larrybabb I agree with your assessment here. It aligns with where I ended up on this question, and proposes the same short term solution I had in mind. The only point I don’t agree about is that qualifiers in profiles (e.g. Finally, I do view this proposal as a step on the path toward us being able to explicitly indicate in the yaml files with some kind of flag or keyword that 1:m property profiling is conceptually happening in the yaml/metaschema-based profiling process. The processer doesn't have to do anything formal with this keyword in executing the transform to json schema - it is just there to indicate that property specialization is happening, and keep profilers honest. This is really the only other thing I would want to add at some point to your proposal Larry . . . more on this in my next comment. UPDATE: I noticed that you removed the In reinstating the qualifier property, I provided a more informative and accurate description of its meaning and utility - which included removing the phrase suggesting that qualifiers are always optional properties - as this is not true of profiled qualifiers in many profiles. I also do think that the proposed 'multiplies' keyword does need to have a simple metaschema function associated with it - namely to remove the 'multiplied' core-im |
To follow up on something Alex said earlier:
I don't think of what is happening with qualifiers in terms of a formal 1:m property "replacement" pattern. Conceptually, I just want to be able to indicate that we are defining multiple sub-properties of the core-im If JSON schema language or tools don’t formally support the idea of 1:m property specialization, that is fine. This parent/child property link doesn’t need to be formally specified/implemented in the final json schema profiles that are generated by metaschema. I just want to be able to see specialized properties in a profile and not see the generic IMO the proposed |
Here I update StudyResult.dataItem property to align with analogous paradigm for qualifier specialization In #134 @larrybabb proposed that profiles can define specific qualifiers as ‘new’ properties as long as they are named as qualifiers. @mbrush clarified that these are not truly ‘new’ – as conceptually they specialize the core-im `Statement.qualifier` property. The use of `StudyResult.dataItem` property in StudyResult profiles is quite analogous, in that profiling often requires defining >1 specialization of this property for different specific data types. In this PR, we update the definition of StudyResult.dataItem in the core-im to support this paradigm and align with how `Statement.qualifier` is defined in PR #___ Specifically: - removed the existing `dataItems` property - replaced it with a ‘dataItem’ property (singular, consistent with it not taking an array in the core-im) - made it take an ‘object’ (which by my understanding covers strings) - gave it a clear and informative free-text description - added a ‘comment’ that explains how this property can get profiled into >1 more specific properties. In the future we might consider some type of flag/keyword on this core-im property to explicitly mark it as amenable to 1:m specialization in profiles – if this helps with conceptual clarity or computational validation.
Would you kindly link some of these examples here, so I can gain some context on what patterns from these other frameworks you are looking to replicate? I was unable to find an example of a |
Hi Alex. Apologies if the text above wasn't clear about this - I am not saying that the linkML or OBO ontology frameworks come with code that supports a 'multiplies'-like function for transforming models when defining profiles (although linkML is interested in developing direct support for a profiling paradigm like the one we are implementing in VA, which will likely include such a function). I am saying that the underlying notion of a property having >1 sub-property ('1:m property specialization') is a pattern found in these and other modeling paradigms. e.g. we see cases in the Biolink Model (the founding LinkML data model) where a slots (aka property) has > 1 sub-property specializations (see slot inheritance hierarchies for the slots here and here). The ability to specify 1:m property specialization is IMO a foundational requirement for the VA/SEPIO profiling approach. I would like our framework and tooling to explicitly support this idea as a part of the profiling process - where a given property (e.g. 'qualifier' in the core-im) can have >1 specialized sub-properties defined in a profile that replace the abstract parent. The most simple and direct way I envision doing this is a flag or keyword we can use in the yaml to indicate where this type of property specialization happens (and to trigger code that simply removes the abstract parent property from profiles (e.g. remove 'qualifier' in profiles where specializations of this property are defined). Analogous to what you have already implemented with 'extends', but for 1:m specialization instead of 1:1 extension. Hopefully that helps, and the rest of my rationale for this above is clear. |
Okay, I'm getting it. The LinkML specializations linked above are property-level inheritance, implemented with |
In thinking about this more, we should step back use of Speaking with my implementer's hat on, I do not see a need to overcomplicate this with a bunch of new patterns for defining profiles. I disagree that a syntax and translation function for 1:m specialization is a foundational requirement for VA-spec. What I want from VA-Spec is the evidence->evidence line->assertion model; if I can realize this without such a function, how is that a foundational requirement? I think the idea is that profiling involves selecting a set of valid properties that may be used; but in JSON Schema, you can easily pare down / profile properties using Following on @larrybabb's proposal, here is a very simple schema that is easy to implement and ensures that all properties adhere to the naming conventions @larrybabb has proposed. To be clear, I'm not a fan of these naming conventions; they are atypical and AFAIK are not driven by any implementation requirements. However, we could simply implement this in the abstract A more complicated approach does what @larrybabb described as more work for little additional gain: inherit semantics from a parent core-IM class for S, P, O, or Q and implement them in profiles. Here is a simplified example of what that might look like. One thing that I like about this approach is that since the SPOQ relations are explicit, it removes any need for these property types to be encoded in the property names. |
@ahwagner what would help me to understand and assess your proposal for qualifier specification would be an end-to-end view of what the various modeling artifacts would look like (the core-im yaml, a derived VarPathStatement profile yaml, and final the json-schema generated by the processer). This will let us fully appreciate/assess how it addresses criteria that are important to us (adherence to VA profiling rules, adherence to traditional modeling paradigms, clarity/ease of profile authoring for modelers, etc). re:
1:m property specialization is core to how the SPOQ model in the core-im is specialized into the SPOQ model for a profile like the VariantPathogenicityStatement profile - which has 3 qualifier properties. Specialization of the core-im I don't know that we definitively need a formal "syntax and translation function" - but IMO a foundational requirement of the support we provide for profile modelers is the ability to declare in their yaml that this kind of specialization is happening - to facilitate adherence to Profiling Rules that say any 'new' properties defined in a standard profile must be specializations/extensions of existing properties in the core-im. If there are ways to achieve this using 'traditional' modeling paradigms', that also make it easy for modelers to understand and draft yaml-based profile documents, I would be happy to go this route. Just need to understand how this would look in practice. |
@mbrush if i may interject. I get that geneContextQualifier, modeOfInheritanceQualifier and penetranceQualifier are all subtypes of When I look at some of the LinkML examples of property inheritance like So, if this is what is needed we would need to find a different mechanism in our approach and tooling to building profiles which didn't create a scenarios that had us place an abstract property like I'm not sure if we should explore defining a way to define abstract property types (slots?) and stick with the approach that all (including LinkML) use to only apply the concrete property types to the appropriate class that will designate that that property will always be available in that class and all of it's descendants. I don't think we can support property inheritance to a property that has already been assigned to a class. In those situations we can simply extend and constrain it further but not expand or change it fundamentally. FHIR is a good example of this, although I don't think FHIR has the notion of property inheritance (but I haven't reviewed it lately). |
Understood. I put together a draft PR (#171) to illustrate what I think captures what you are looking for while preserving typical inheritance patterns. It injects slot definitions and the |
@ahwagner very creative and a great way to provide a pathway between the linkML (and ontological) property inheritance capabilities and still works with json and jsonschema. And bonus points for not having to craft additional tooling. |
Per discussions held on 8/6 and 8/7 with @mbrush and @larrybabb, we will not be incorporating LinkML features (e.g. slot inheritance) at this time, as this represents a substantial risk to target timelines for VA-Spec deliverables. We will incorporate SPOQ semantics as part of variable names as a compromise, in recognition of the SEPIO conceptual model. Continued work on the SEPIO conceptual design and domain-agnostic framework will be disentangled from VA Spec and maintained in a separate repository by the SEPIO developers. Closing this issue for now. @mbrush or @larrybabb please reopen if this is still required for VA Spec 1.0. |
The current core-im-source draft from Larry and Alex uses a plural label
qualifiers
but takes a single object as its type:I assume this plural name is used because of how you implement statement-specific qualifiers in profiles like the Variant Pathogenicity Statement - where the
qualifiers
field takes a single "untyped" object within which several specific 'qualifying' properties are defined:I would like to propose alternatives to the naming and nesting of properties in undefined objects as a way to specify profiled qualifier representations.
Alternative 1 (preferred): Directly specialize the core-im
Statement.qualifier
property in a given Profile schema to define specific qualifier-type properties. So for Variant Pathogenicity, where we want to specify three possible types of qualifiers (penetrance, moi, gene context) - the VariantPathStatement profile would include the following three properties on the VarPAthStatement object that all extend a core-imqualifier
property:This is to me a clear, simple, and direct implementation of how I have envisioned qualifier specialization working. At its core is the simple pattern where a profile holds several distinct qualifier properties that are direct properties of the Statement, and not nested in some untyped object. But not sure if there is a technical or application-specific reason this won’t work?
Alternative 2 below was not considered / approved on the 5-29 call, so next step is to wait for Alex' team to draft an implementation of their approach to encoding Alternative 1 in a way the metaschema tooling can handle.
Or, might we consider refining the metaschema code to allow for multiple specializations of a single property - so we could directly adopt Alternative 1 above? If it is just a technical impediment that is preventing this useful feature, perhaps we could change it?
Alternative 2: Another approach that is closer to the current way things are specified, but explicit declares object types, would involve a
VariantPathogenicityQualifierSet
class in the VarPath profile schema. In this class, we could define the three qualifier properties defined in the current schema within a nested untyped object. This class would extend a genericQualifierSet
class we would want to add to the core-im. Then the existing VarPathStatement.qualifiers` property would then simply reference this class as its type, e.g.:In the core-im, we add:
In the
VarPathStatement
class:Definition of the
VariantPathogenicityQualifierSet
class in the varpath profile:While this is not as simple and clean as the first alternative, it is at least IMO a more clear and explicit way to implement the current approach – in that avoids it what I find to be a confusing use of nested properties and untyped objects. However, it does require defining and profiling an additional core im class (
QualifierSet
) in order to define a VarPathStatement profile. But again, if we go with Alternative 1 above, which I prefer – we avoid all of this.The text was updated successfully, but these errors were encountered: