-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced attribute qualifiers: passthru and transient attributes #86
Comments
I think I agree with most of this. The way you explain this, it makes it even clearer to me that these are two different qualifiers and should not be conflated to one. I like the name "transient", I think you nailed that one. Especially since the schema describes the level 2 representation, which is the representation that is transient (while the level 1 digest lingers on). Passthru is also named well, as the level 2 representation is passed through to level 1. The only thing is the spelling of this. Wouldn't "passthrough" or "pass-through" be more "standardized"? I wasn't even aware you could spell it "passthru", but it in any case seems to be a US-specific spelling? Anyway, just raising it, and leaving the discussion to native English speakers. As to where the qualifiers should be defined: I think a bit differently: I think the "passthru" The "transient" qualifier is more dependent on the implementation. If an implementation wants to allow the retrieval of level-2 values for this attribute, then that doesn't change anything essential about the attribute. So I think this should be at the level of "required". |
On naming: "thru" is a sort of, relatively common abbreviation of "through", I guess. I used To me, the fact that passthru on applies to level 1 vs level 2 representation means it's not essential to the definition of the attribute per se, but only relevant in the context of the object. The object is what defines whether an attribute is required or not, and the object defines how it's going to treat attributes at level1 vs level2. |
Not sure I understand your point here. To me, the logic of the schema is that is mainly describes the contents of the level 2 representation, e.g. the type of the "sequences" is an array of strings. The default algorithm is then that the level 2 representation is canonicalized and digested. For the passthru attributes, the schema still describes the level 2 representation, but the passthru denotes that the default digest algorithm is not used. To me, the definition of an attribute should include the level 2 representation AND the algorithm to get to level 1 - both are essential to the definition of an attribute. However, there is no formal way of specifying the digest algorithm in the schema. One could for instance imagine a flag specifying that the "sorted" attributes has a digest algorithm is a bit more complicated than the default. The Perhaps what we really should have is a 'digest_algorithm' qualifier using a controlled vocabulary, with the current options being e.g. "seqcol_digest", "sorted_seqcol_digest" and "passthru"? |
Would rather not make it more complicated, I think we're already overengineering this. To me the schema is describing the level 2 representation. It need not be aware of the level 1 representation; that is outside the schema. The schema is more general, and it is usable for other purposes outside seqcol. For 'passthru', I'm thinking from the perspective of attribute re-use. I thought qualifiers that would be carried over in case of re-use (in an external schema) should be local, whereas those that wouldn't should be global. If I want to build a schema that imported these attributes, as external definitions, I think:
So, from that perspective, I think passthru is a global qualifier. I guess maybe the main difference is that you're seeing the schema as more tightly coupled to the seqcol protocol -- I'm seeing the schema a little more broadly. |
Right. Now I follow you. Yes, I think that makes sense. I agree that we shouldn't overengineer things more. I'm fine with your suggestions. My main point was that the standard shouldn't allow people to redefine a passthru attribute in a particular implementation as a digested attribute, breaking interoperability at level 1. So as long as there is a mechanism for this, I am fine with your suggestion. What is important now is to get this beast out of the door. |
Yes, I agree with this in principle. But then again, is it so bad if two implementations differ on that? Well:
I don't see this as an important issue, I guess. But, could also be considered an argument in favor of bringing 'passthru' with the attribute as a local qualifier, as you originally suggested. On the other hand, if somehow there was a situation where one wanted to call it passthru and the other didn't, then maybe there's a reason for that, and then it wouldn't make sense to have it as a local qualifier. |
Not a big issue, I agree. Passthru as a global qualifier is fine with me. |
Thank you both for the detail description of these two attributes. The one point I'm still unclear on is the expected behaviour of the However not all Thoughts ? |
EDIT: see next comment
The issue I had is that there is really nothing in the schema that specifies that a Regarding filtering the list endpoint on |
Now you confused me: wasn't the whole point of having two qualifiers that If I still don't see why you would want to make |
I think I'm with Tim here. I think both attributes are implementation-specific. Rationale is from my comment above, here:
I think that's one of the reasons to declare the variable as passthru in the schema. TO your comment:
Why do you say that? I disagree in several ways:
So my conclusion is: passthru is implementation-specific, there's nothing we can do about it anyway, and it's not really a problem for anything I can think of. |
Hmm, I see what you mean. That's a tough one. I was thinking, most of these would probably be transient attributes (not passthru attributes). It's not a problem for transient attributes that are not passthru, those would be digested and thus easily filterable. But for attributes that are passthru, I can imagine that some would be filterable, and others would not, and that's the issue.
I think this could work, but it's also a bit unsatisfying... I guess this is causing me to rethink some of my above response to @sveinugu -- Is this the reason you were suggesting that passthru attributes have to be strings? Could we just say that all passhtru attributes have to be strings? But even if they are strings, it still might not make sense to allow them to be filtered... |
No, a
I think there is a misunderstanding here. I am talking about I don't believe it makes sense to define a. Pass the array of strings at level 2 on to level 1 Neither of these make much sense to me if we retain the
That is also my question. I am open to that being the case, but it would be easier to discuss with an example. My main point in the above was that
I am not proposing any such thing in general. But I do think we should define My point is to define this in our canonical list of attribute definitions to support interoperability. The comparison with the I now do agree with you that (We should btw respond to Alex on the other issue!) Edit: I misremembered. It is the "inherent" attribute we are standardizing, hence my last comment does not make sense... |
Ok, I think we're on the same page. Would this be an acceptable rule on the filtering of passthru question:
That's all we say. Basically, according to the spec, passthru attributes need not implement filtering. but transient attributes do. Does this simple rule solve the problem? Of course, an implementation is free to implement filtering for passthru attributes that are strings. but we don't need to mandate that. |
Yes, I think this makes sense and is a simple solution, at least for now. |
I think I agree with the consensus now and realised that I was slightly confused before about when we would be using Transient attributesA transient attribute is an attribute that only has a level 1 representation stored in the server. ConstructionA The flexibility of the construction is meant to allow the construction of level 1 digests that will differ from the standard one but in practice the attribute is expected to be an typed array that contains values List endpointA collection endpointAt level 1: The level 1 representation of a sequence collection should list the Comparison endpointThe comparison representation of a sequence collection should list the Attribute endpoint
Passthru attributesA ConstructionA List endpointA collection endpointAt level 1: The level 1 representation of a sequence collection should list the Comparison endpointThe comparison representation of a sequence collection should list the Attribute endpoint
|
I've read through and I think everything looks right -- I'm still thinking about the collection endpoint behavior, I hadn't thought about that yet. But 1 thing I noticed is this:
Edit: cross out what was wrong |
Thanks @tcezard for this overview. Great that you reviewed all the endpoints in detail, which was lacking. I think I agree with most of the TBC points, except perhaps wording (e.g. Three things I am unsure about:
|
|
I've updated the PR with the latest changes, please have a look. |
We've come to consensus on these points, and this is now explained in PR #87. |
In our last meeting, we discussed two additional modifiers.
This is my attempt to document the rationale behind these two attribute modifiers.
Advanced attribute qualifiers: passthru and transient attributes
For the basic seqcol attributes (names, lengths, sequences), the general algorithm and basic qualifiers (required, inherent) suffice to describe the representation. But some more nuanced attributes require additional qualifiers to describe their intention and how the server should be behave. For example,
sorted_name_length_pairs
andsorted_sequences
are intended to provide alternative tailored identifiers and comparisons, and not necessarily useful for independent attribute lookup. Similarly, custom extra attributes, likeauthor
oralias
, may be simple appendages that don't need the complex digesting procedure. In order to flag such attributes in a way that can govern slightly different server expectations, we need a couple of additional advanced attribute qualifiers. For this purpose, we introduce the passthru and transient qualifiers:Passthru attributes
Most attributes of the canonical (level 2) seqcol representation are digested to create the level 1 representation. But sometimes, we have an attribute for which digesting makes little sense. These attributes are passed through the transformation, so they show up on the level 1 representation in the same form as the level 2 representation. Thus, we refer to them as passthru attributes.
Transient attributes
Most attributes of the sequence collection can be retrieved through the
/attribute
endpoint. However, some attributes may not be retrievable. For example, this could happen for an attribute that we intend to be used primarily as an identifier. In this case, we don't necessarily want to store the original content that went into the digest into the database, because it might be redundant or whatever. We really just want the final attribute. These attributes are called transient because the content of the attribute is no longer stored and is therefore no longer retrievable.The interaction between passthru and transient attributes
By definition, passthru attributes are also transient, because it makes little sense to retrieve a level 2 representation from a level 1 digest if no digest/transformation occurred (you would just be retrieving the same value you used in the request). But it is possible to have transient attributes that are not passthru; these would be attributes you do want to digest before adding to the level 1 representation, because they are either large or intended to be used as an identifier, but you don't need them to be represented in original form on the level 2 digest.
How to use them
Should they be specified as a local qualifier, using a key under a property (like we did with
collated
); or as an object-level qualifier, specified with a keyed list of properties up one level (likerequired
, and what we did withinherent
) ? (Further rationale in decision record: 2023-08-22 - Seqcol schemas MUST specify collated attributes with a local qualifier.I think
passthru
should be an object-level qualifier, since it's really only relevant for the transition stage of the object from level 2 to level 1.I thinktransient
could go either way. It feels a bit more local to a specific attribute, so maybe best as a local qualifier. But, I also see some value in keeping them both in the same place, since they have some similar properties. So, I'm not sure right now where I would put them.Edit: I think both passthru and transient are really details of a particular implementation, and not inherent to an attribute itself; as such, I think both of them belong as object-level qualifiers.
The text was updated successfully, but these errors were encountered: