-
Notifications
You must be signed in to change notification settings - Fork 171
Proposal: CUE Querying extension #165
Comments
Isn't this already supported?
|
Could we generalize the use of any constraint for use as a field?? The scope of this request is much larger than support json schema but would capture the quoted use case and would allow for something like the following:
Obviously implementing a runtime that support this is no easy feat but it's arguably the biggest missing construct convey a value lattice |
This would also eliminate the need for additional notation described in the following:
The above example could be written:
Given The Array section already touches on this when it describes the NOTE: there is something missing though as the notion of an array is ambiguous in this example and
then
then it follows that
Where similar to the brackets in
similar to the idea that |
More examples from the rest of the doc
Would this example break the property that expressions can be evaluated in any order and yield the same result? Since the following would yield
|
@rudolph9 |
@rudolph9 Allowing any type as key: needs thorough investigation. Would be interesting to see how YAML resolves it and what kind of issues they run in to. The idea to use this instead of value pointcuts is interesting. I will need to think about that a bit more. Associative lists: it is indeed true that lists cannot evaluate until all information for an expression is selected (and the type is clear), but it should not affect evaluation order other than that. This is not far off from how evaluation works in CUE anyway. If it does break commutativity than this is broken. It is key for CUE to stay commutative. |
@mpvl Ultimately my concern is around adding functionality which doesn't have basis in foundational set theory aspects of cuelang. Associtive list could defined within those set theory bounds if we were to treat arrays as syntactic sugar for something like the following:
By default the above would yield the current functionality and only by setting the optional filed One big missing feature I've found in cuelang is ability to add to a disjunction associated with a filed, (hence my |
@rudolph9 I'm not sure what you mean that you cannot currently associate a default with a field. You can... But yes, in order for associative lists to work, lists would be syntactic sugar for maps with implied keys, either being an integer index derived from its position or a key derived from the value (for associative lists). So lists would be maps with the additional constraint that the number of elements is equal to the largest index. In the current implementation it is very close to this interpretation already. |
@rudolph9 Regarding your example to eliminate value constraints, I assume you meant to use square brackets (as parentheses would require the value to be concrete). I'm unclear as to how you would distinguish between keys and filters. For instance, how would you filter on a string value versus selecting a set of keys? The two kinds of filters would need to be clearly distinguished. What I could imagine is allowing key and value filters within a single square bracket separated by a colon:
It is a bit weird, though, as the interpretation of key may be a bit unclear. Also it gets awkward for filtering when the key constraints is a string, ( An alternative to the proposed syntax I had in mind myself was to write value filters as:
using |
@mpvl here is related article on Graph-Relational Object Queries might offer some inspiration. |
This now also allows any of the non-JSON keywords to be used as references. Previously, these were already supported as field names. Issue #339 Issue #165 Change-Id: I721d054c8220ba3536f680fe2e3e502a62f99b6b Reviewed-on: https://cue-review.googlesource.com/c/cue/+/5683 Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>
@rudolph9 : we have looked a bit more into constraining using unification or subsumption, instead of boolean expressions, for instance:
I'm using [] here instead of () in your example, as that is the syntax we use for pattern matching. But the idea is the same. Note that The problem with this approach, though, is that it really doesn't make sense to use unification here. Subsumption makes more sense, but computation of subsumption is hard to do precise, and probably not tractable in all cases, even for resolved values. It also seems unreasonable to require fields to be concrete (maybe resolved, but not concrete). For instance, for k8s, many fields of a service will probably remain undefined. The great advantage of using a boolean expression, is that all these issues go away. Note that GROQ also uses boolean expressions, as do all other query languages. We do want to be able to reserve the ability to come up with a way to do subsumption reasonably. One way to keep this open, is to piggy back on issue #575, and only allow queries of the form That said, I'm not saying it is impossible to use subsumption. We can possibly get there by adding some restrictions to struct concreteness, implement precise simplifications of ranges, and verify the feasibility of some regular expression math. CUE only allows RE2 regexps, which means it is theoretically possible to determine the intersection of them in O(n) time, which is one thing that would be needed. Theoretically, only the latter is needed for unification (although it resolves itself when concrete values are involved). |
List operators are confusing. For instance, is the result of [a, ...] + [b, ...] open or close? What about [a] + [b, ...]? And so on. With the current semantics of comprehensions, they are also unnecessary. The above can be written as [ for x in a {x}, for x in b {x} ]. Though more verbose, it is very clear here that regardless of whether a and b are open or closed the result is always closed. With the query extension, this can also be written more succinctly, like `[ a.[_], b.[_] ]` or perhaps `[a.*, b.*]`, either case is clearer than using list operators. See Issue #165. In order to move to being able to give backwards compatiblity guarantees, we propose getting rid of list addition and multiplication. An automatic rewriter could rewrite the old use using `list.Repeat` and `list.Concat`, the latter of which could refer to the spec to indicate its equivalence to using comprehensions or, later, queries. Change-Id: I374bfd59775d66d3da9feb28e1940f8bd3c255e8 Reviewed-on: https://cue-review.googlesource.com/c/cue/+/8063 Reviewed-by: CUE cueckoo <cueckoo@gmail.com> Reviewed-by: Paul Jolly <paul@myitcv.org.uk> Reviewed-by: Jonathan Amsterdam <jba@google.com>
Why wouldn't |
In the I'm guessing that
|
Replying to self following discussion with @mpvl. That should actually be written as:
|
@extemporalgenome this is indeed a similar concept. The main difference is that it is defined to apply to all matching fields, also existing ones, and not just additional ones. This makes encoding JSONSchema in CUE, well, annoying, We thought about changing the semantics and transition CUE to be more in line with JS, but decided against that. It turns out that the CUE semantics is actually used more and quite convenient. Instead, we think that |
@rudolph9 regarding using subsumption, we think it may be possible if we the use of
These rules are hopefully easy enough to explain. The implementation of this would also be straightforward It would be quite neat to be able to query |
@jlongtine I recall you asked about recursive queries. We have looked on some algorithmic improvements for CUE and one interesting structure sharing approach may make this possible. Translated from YAML, it would allow any input to be processed in O(n). So that means validating the YAML example in the https://en.wikipedia.org/wiki/Billion_laughs_attack (fully expanded Given a well-behaving CUE program (like no comprehensions, discriminated disjunctions), this means that you could such still validate such an exploding YAML file in bounded time. We can use the same principle to search for values in a tree in bounded time. That makes no sense when capturing in a list, or when the query has a relation with parent nodes, but at least it would be possible to do static analysis and optimize cases where this is not necessary. Anyway, so if this is functionality people are interested in, it seems a possibility. |
@mpvl Definitely interested in this tool. I have some places in https://github.com/cue-sh/cfn-cue that could be helped by recursive queries. Happy to lay out the use case for you soon. |
This issue has been migrated to cue-lang/cue#165. For more details about CUE's migration to a new home, please see cue-lang/cue#1078. |
We propose an extension to the query capabilities of CUE. Querying, in this context, is more than just retrieving data. It also means the selection of nodes to which to apply constraints and, as we will explain, interpretation of lists as association lists.
This proposal is meant to give a high-level design for these capabilities, with more detailed design to be flushed out later. The main point of it up to this point is to show the coherency of the syntax and semantics of these new features.
Objective
Add minimal syntax and semantics to CUE to allow:
have constraints applied,
Allowing closer control over pushing out of constraints is needed to support JSON Schema compatibility down the line.
A guideline for CUE is to keep the spec considerably smaller than that of YAML. Ideally, these changes should be added with little or no growth to the spec. We achieve this by 1) exploiting the commonalities between these constructs, and 2) removing constructs that are no longer necessary.
Background
We assume the merits of a query language are clear. What is probably less clear is how a query language relates to two key aspects of CUE’s view on proper configuration and schema definition. In this section, we describe the various aspects of CUE related to querying and how it's possible to design a query language that fits well with the other aspects of the CUE language.
The aspect-oriented nature of CUE
CUE can be seen as an aspect-oriented language in addition to be a constraint-based language. Just as a JSON configuration can be modeled as a collection of path-leaf value pairs, a CUE configuration can be seen as a collection of paths-constraint pairs. So instead of specifying a single concrete value for one path, CUE defines a constraint for many points in a tree at once.
To draw the parallels with aspect-oriented programming, path selections correspond to pointcuts and constraint correspond to advice.
Current limitations
As an aspect-oriented language, CUE's path selection, or pointcut, abilities are currently somewhat lacking.
CUE’s schema model is quite close to OpenAPI and JSON Schema, which makes CUE a good fit for processing those standards. However, CUE currently cannot represent certain JSON Schema constraints, most notably
patternProperties
. This is a result of CUE's limited pointcut abilities.Queries to the rescue
A query language for CUE, or any JSON-based query language, selects data from the JSON data tree. Existing languages have become quite powerful at that. Even though it doesn't support querying other than in the form of comprehensions, it already possesses many of the constructs needed to define a query language.
It turns out that the parts where CUE lacks constructs to construct a JSON-query-language-like construct closely correspond to the parts where CUE is lacking in supporting JSON Schema. Keeping the syntax for these queries close to the syntax for selection in label expressions will aid both the user and keep the spec small.
Shadowing
Command-line tools like
jq
are more useful if queries can be specified on a single line. The same would hold for CUE. A problem in CUE that thwarts the ability to make single-line queries is field shadowing. The current alias construct provides a way around this, but requires multiplelines to write.
A good querying language will benefit from being able to avoid shadowing.
Abstraction issues and associative lists
CUE solves a longstanding problem in configuration design that plagues most configuration languages: whether to use abstractions or not. The idea behind using an abstraction layer is that it can 1) hide complexity from the user and 2) protect the user against misuse. But an abstraction layer is prone to "drift". The abstraction layer starts lagging as new features are introduced. This poses a maintenance issue. This problem is often so severe that abstraction layers are a bad idea. But not using an abstraction layer and configuring an API directly means exposing the user to potential mistakes.
CUE solves this issue in two ways. CUE allows defining constraints directly on an API, making it unnecessary to introduce abstraction layers in the first place. But where abstraction layers are used, CUE's composition model allows mixing it with direct-API use without foregoing the protections of the abstraction layer. It allows combining the best of both worlds.
For this to work well, though, it is important that defining CUE directly on an API is possible in the first place. This is currently not the case. The problem is lists. In many configurations, lists are interpreted with set-like properties. For instance, a list of strings is interpreted as a set of strings, whereby each element must be unique, such as in:
Or the elements of lists are structs of a certain type, whereby a struct with the same name may appear only once. For instance, merging these two lists
is expected to result in
To deal with this effectively in CUE, one currently has to convert such lists to structs and back, violating the assumption that CUE can be applied to the native API.
Proposal
We address all of the issues above by introducing changes to the language in several steps. These steps are grouped by functionality and are chronological.
Changes to field syntax
The first set of changes relates to changing the syntax for defining fields. We assume the adoption of colons (
:
) instead of spaces separating fields in the shorthand mode (more on this in the discussion).The semantics of optional and required fields with respect to CUE's value model remains unchanged.
Optional fields
CUE currently allows specifying an optional field using the following notation:
We introduce the ability to specify optional fields in bulk using the notation
defining an optional field
x: value
for any labelx
that unifies (matches) withexpr
.The existing notation for optional fields
foo?: bar
remains as syntactic sugar for["foo"]: bar
. The meaning of optional fields remains the same otherwise.As with normal fields, constraints for optional fields are additive.
Note that even though it is not presented as such in the spec, CUE currently already allows specifying bulk optional fields using the
<Name>: value
notation. This would become[string]: value
in this proposal (modulo theName
alias, more on that later).Required fields
CUE currently allows specifying required fields using the following syntax:
We introduce the ability to specify a single required field using the notation:
with a label resulting from evaluating
expr
and valuevalue
. In this casefoo: bar
is syntactic sugar for("foo"): bar
.NOTE: we may or may not actually introduce this notation, the
"\(x)": bar
notation gives most of the benefits and may be clearer. However, this notation helps to understand the consistency of the resulting syntax and where the language may develop.Aliases
We introduce the ability to alias labels and field values. The notation
X=foo: bar
bindsX
to the same value asfoo
, namelybar
, and is scoped using the same rules as forfoo
.An alias used within parentheses or square brackets (
(Y=expr): bar
or[Y=expr]: bar
) binds to the label value and is visible within the value of the field.Example:
Elimination of quoted identifiers
CUE currently allows backticks to create identifiers that are otherwise not referenceable. With the proposed aliasing construction this is no longer necessary. Back-quoted identifiers were necessary to allow referring to fields with keyword names or to declare definitions with names that are not valid identifiers (
a[x]
cannot be used to look up definitions).With aliases, instead of referencing a field with an invalid identifier as name with a quoted identifier, one can just alias this field:
This does not solve the case of accessing a field with an invalid identifier name within another reference. Currently, these can be referenced using quoted identifiers:
The aliasing technique doesn’t apply here (aliases are local only and are not accessible from outside the scope in which they are declared). A solution to this is to allow strings as selectors. In this case, the field called
foo-bar
could be referenced as:Unlike using
x["foo-bar"]
, this would also work for definitions.To be consistent, we should at this point also allow
x.0
to select the first element in a list. This proposal makes almost any expression possible as a selector, and it would be strange to exclude integers.Also, if we allow constraints to be applied to list indices on the LHS, it is consistent to allow selection using such indices as well. To make things even more consistent with the LHS, allowingfoo.(x+y).bar
would make selection almost fully symmetric with the LHS.Examples
A map type from strings to integers is written as
Assuming a definition for
JobID
that defines valid IDs for jobs, and a definitionJob
for valid jobs. A map of jobs can be defined as:The old bind or template syntax (
<Name>: { name: Name }
) will be phased out. It can be replaced withor
A field with a label that is not a valid identifier, or with just a really long label, can be aliased using a more convenient identifier:
where the
X
here refers tovalue
.This example also shows why an alias for a field value is at the start of a field and not at the value:
If the notation had been,
"foo-bar": FB=settings: { name: FB.name }
, it would have appeared thatFB
were solely bound to thesettings
section, and not the whole struct. It also would suggest thatFB
only binds to that value, and not the result of the entire field, which may bea more restricted definition as the result of other declarations for the same field.
The common notation for writing single-value string interpolations for generated fields
can now be written as
Field aliases can also be used to unshadow a reference.
or, alternatively, noting that labels with string literals are not referenceable:
The proposal also allows selectively applying constraints based on the value of a label. For instance, to apply a constraint only to fields that end with “Test”, one could write:
A mapping from JSON Schema to CUE would require supporting such a capability, which is cumbersome to implement in CUE with the current capabilities.
Unifying the concepts of lists and structs
In CUE lists and structs are strikingly similar at the implementation level. CUE stores lists as integer maps, for which there must be a field corresponding to each element in the list up to the largest defined index. This may not be the most efficient, but it greatly simplifies many aspects.
However, there remain significant differences in syntax and functionality. It is worth considering the benefits of unifying lists and structs at the syntactic level as well.
Types for all elements versus remaining elements
In CUE,
[X, Y, ...T]
defines a list with two elements that also allows any number of additional elements which must be of type T. However,X
andY
themselves need not be of typeT
. This definition corresponds to theadditionalItems
keyword in JSON Schema.There is no equivalent for the corresponding
additionalProperties
keyword, which applies to structs. As structs already sport the...
notation for open-endedness, it is clear, syntactically at least, what that analogue would be.Conversely, in structs we allow defining a type that must apply to all elements by means of the generalized optional field syntax:
[expr]: T
(or previously<Name>: T
). There is, however, no equivalent for this in lists. As CUE already treats lists and structs equivalently semantically, such a construct for lists can analogously be written as:Integers in label expressions
Viewing lists as integer maps would extend the same power and flexibility of selecting labels and applying constraints to lists.
For instance:
Access to optional elements
CUE currently doesn't allow access to optional values. For instance, this results in an error:
We may consider allowing this in the future, either by not making it an error or requiring an added question mark
foo.baz?
. For structs this would be an easy addition:foo[expr]
is guaranteed to only give access to regular required fields, whilefoo.label
gives access to those same fields, definitions and then possibly optional values.For lists currently only indexing is allowed. Interpreting lists as integer maps, or even allowing integer maps altogether, would make it reasonable to allow also integer positions in selection positions:
This also be more consistent with allowing integer values for labels.
Integer maps
Some of CUE's integrations, most notably YAML and Protocol Buffers, support maps with integral keys. It is not sure CUE should go there, but it is worth keeping this option open.
Value filters
With the notation for optional fields above, we provided a way to apply constraints to a subset of fields based on the field label. One may also want to apply constraints based on the value of a field.
This can currently be done with comprehensions. Comprehensions can be a bit clunky, however, and cannot be used as a query shorthand. We would like a notation that is consistent and convenient for LHS selection and querying.
Consider this definition.
We want to now further constrain that any value with a port in the range
50000
to50100
must start with the name
"home-"
. For this we borrow the[?expr]
notation from JMESPath and JSONPath to filter a value based on a boolean expression:The
?
notation indicates that we are filtering a value. The@
notation is a special identifier that refers to the "current" value under consideration (the value that will land after the colon). Like JSONPath and unlike JMESPath, the use of@
is required in a CUE program to access fields in the RHS value. (This requirement might be dropped for command-line queries.)Label expressions and value filters can be combined, allowing one of each:
Examples:
In this example we use value filter to set some qualifiers
to existing data.
If a bag has more than 50 apples, we consider it heavy. And if an apple is large, we reduce the capacity of the bag. As with comprehensions, the constraints are unified after evaluating the value and checking all conditions.
Associative lists
An associative list is a list that defines a key for each of its values, effectively turning it into a map. In CUE terms, elements with the same key are unified and collapsed onto a single element as if it were a struct.
We can introduce associative lists by generalizing the concepts introduced in the previous sections. Most importantly, we allow the
@
notation referring to the right-hand side to also be used in a label expression, allowing it to specify how to derive a key from its elements.Suppose we have the following two lists:
What we would like these two lists merge into:
We can achieve this by defining
a
as an associative list:This would tell CUE to interpret the lists as maps keyed by the
name
field of its struct elements. Theoretically, it could also be used as a validation check on map keys. We intend to initially disallow a mixture of structs and lists to define a value and will only allow this notation for lists.Associative lists can also be used to define sets. For instance, a set of strings can be defined as
such that
evaluates to
(consistent with how structs work, it would not be an error).
A slightly more complex example involving environment variables:
yields
Indexing into associative lists would only be possible by means of the key value (
cmd1.args.FOO
); the integer index would be meaningless. Defining an associative list indicates that the value should be output as a list. An open question is whether one should also allow keys to be specified explicitly, such as{ FOO: "FOO=3" }
or perhaps[ FOO: "FOO=3" ]
. This is not planned for an initial implementation.In some APIs the order of elements in lists is important even if it is interpreted as an associative list. Such semantics would be incompatible with CUE's value model. It is, however, in the realm of possibilities to give some guarantees on topological sorting on a best-effort basis outside of the usual value model. The above example of the command line arguments applied such an algorithm; in that case there was only one possible order.
Querying
In this proposal we have suggested increasing the symmetry between the selector operator and what can be specified as a label. For instance:
We propose that the LHS square bracket notation extends analogously to become a query operator.
In this proposal, any selector with an expression containing a square bracket it is a query. The result will be a possibly empty list containing all matching values. Any subsequent selector operating on the result applies to each individual result. This is called projection in JMESPath. If the expression is followed by anything other than a dot, the sequence terminates and a list results. The result is always a list even if it yields a single value.
A query has one of the following forms:
This makes projections, selector expressions that selects
0
or more entries into a list, easy to recognize: they all start with.[
or include a[?
, or both. For clarity, we could disallow the third case. Having a clear syntactic distinction between starting and ending a projection avoids the ambiguities one can find in JMESPath, wherefoo[:5][:3]
and(foo[:5])[:3]
mean different things.Examples:
Data:
Queries:
Flattening
JMESPath has a flattening operator, denoted
foo[]
, which flattens out one layer of lists. For instance,[1, 2, [3, 4]][]
could mean[1, 2, 3, 4]
. It may be worth considering providing the same. Unlike with JMESPath, such an operator would not start a projection, but rather terminate one. A subsequent query operator can be used to restart projection if needed.Initially, implementation can just use the
struct.FlattenN
builtin.Recursive descent
The proposal currently does not address a recursive descent operator, as available in JSONPath and
jq
. This may be considered as a future option, but performance considerations and semantic consequences have to be carefully weighed.Construction
JMESPath and
jq
allow constructing new values from collected results. The square bracket operator is too overloaded in the proposed query syntax to be useful for construction and it is desirable to not break the simple rule that.[
always starts a projection.A possibility in CUE is to piggyback on the emit value syntax. This would allow
.{expr}
to be used for generating values. The obvious use for this is to construct structs, but other values, like lists or strings, can be used using the emit value syntax.Examples:
A big advantage of this approach is that it also allows for the construction of strings and other values, not just lists or structs. It also clearly distinguishes the
.[]
notation for projection, resulting in a list, the.x
notation for selecting a single value, and.{}
for the creation of values.Open Question: Aliases in queries
If we allow construction, it becomes useful to allow aliases in queries as well. The proposed syntax looks a bit awkward with the dot notation. For instance in
foo.X=bar.baz
, it appears as ifX
is bound tobar.baz
. Allowing aliases only in front of brackets, such as infoo.X=[].baz
,foo.X=("bar").baz
, may mitigate this issue, but it is still does not have the clarity as using the:
notation.It may be sufficient to say that for these cases one would need to use comprehensions. Note that neither JMESPath nor
jq
allow this kind of flexibility in construction.Slice
JMESPath, JSONPath and
jq
all implement a variant of the slice operator. For consistency, it may be worth reintroducing this operator in CUE. It could be introduced as a normal operator as well as a selector operation allowing it in a.[]
. Note that the latter is somewhat redundant, as.[]
already allows selection on index values. Adopting slices in that situation may make more sense when adopting Python style semantics (allowing negative index and steps, just like JMESPath).Evaluation order
Although many of the proposed constructs already have an equivalent in existing constructs, it is good to consider once more how all of this is evaluated.
Evaluation of a node starts by evaluating the nodes on the path leading to the respective node and collecting the results in a single expression. Note that CUE evaluates lazily, which is necessary to implement the constraints implied by the value lattice.
The evaluation of such an expression commences as follows:
To preserve commutativity, constraints applied at Step 4 only consider the value obtained at Step 3. A validator will either pass the current value or return bottom (a failure). Running validators at the end of the validation allows validators to be non-monotonic, allowing CUE some leaway in implementing non-monotonic validation such as the cardinality of a set or a
not
operator or builtin.Note that although CUE only evaluates one node at a time, it defines a struct to fail if any of its children fails. So it still needs to descend down all nodes to validate that a configuration is valid for structs.
Discussion
Label expression notation
The notation for referring to a label value on the right-hand side has become slightly more verbose. Previously it was
now it is
It is also a little bit awkward.
In practice, however, the need for this construct seems less common than originally anticipated. Moreover, where it is used, it seems to be the result of fitting a native API onto a different CUE mapping when the native API is not convenient in CUE. With associative lists, the need for this construct should lessen more.
On the other hand, the need to constrain keys to a certain value, or to associate a set of constraints only with a certain subset of keys, seems much more common than originally anticipated. The
[expr]: foo
notation works much better for this use case than the<labelIdent>: foo
notation, which did not allow for either case.Another issue with the angular bracket notation is its parseability. With the introduction of unary comparators, most notable
<
and>
, parsing these labels became complicated. With the introduction of:
separators, the use of angular brackets also seems to have become esthetically less pleasing.The square bracket notation has precedence in other languages. A map definition will look quite similar to one in TypeScript. The square bracket notation is also used in JSONPath, JMESPath, and
jq
to indicate field selection, very similar to the semantics in the way fields are selected in CUE.The
[expr]: foo
notation is somewhat unfortunate for people coming from Jsonnet, where it means the equivalent of(expr): foo
proposed here. However, the(expr): foo
notation is used injq
and seems to more intuitively fit with the interpolation syntax and CUE as a whole.The aliasing construct is very powerful and solves very common shadowing issues. For instance, a field called
bytes
of typebytes
. Reserving identifiers starting with__
and backtick identifiers have proven to be insufficient to resolve even common cases.Consolidation of list and struct concepts
The syntactic regularity that comes from consolidating the list and struct types may be pleasing, but it may also be harder to differentiate between the two cases. The question is how much this matters. For instance, it is already not possible to see whether
src
is a list or struct insrc[x]
or evenfor x, y in src
.The big advantage in consolidating the type is with querying. When using projection, the process of continuing selection on a collection of values, it is useful if it is crystal clear where a projection starts or ends. Having the same syntax for lists and structs means less to learn in this case.
In many cases, though, the type will be clear from the context, as in
foo.[>"X"]
andfoo.[>10]
. Where this is not the case, users could always clarify whether operating on list or struct by explicitly mentioning the key type, as infoo.[string]
orfoo.[int]
.Overall, the regularity of the syntax seems to be desirable here.
Comparison to other query languages
Although we've taken elements from
jq
, JMESPath, and JSONPath in the design of the query language, none was quite suitable for use in CUE.JMESPath seemed the best candidate, especially because it is well defined. Syntactical elements such as using
|
for pipe, backticks or single quotes for literal JSON make it incompatible with CUE. JMESPath also allows identifies to resolve to top-level fields in the "current object". This is possible as JMESPath has no other context. In CUE this is not possible, so CUE adopts the convention of JSONPath of always needing to refer to the current object explicitly.Projection also has some unexpected properties. For instance, in JMESPath
is not the same as
Although this outcome is well defined and an understandable outcome, it is not quite a desirable property. A similar effect exists with any of the operations allowed as a projection. This is exasperated by the fact that it may not be clear from the immediate context whether an operator acts on a projection or not. For instance, the outcome of
expr[0]
differs depending on whetherexpr
is a projection or not. In the proposed query syntax, a projection can only be continued using a selector, no exception. As such the dot in a query corresponds strictly to the colon on the left-hand side.In the current proposal this effect still exists, but is limited to the selector operator, where such semantics may be expected. We could eliminate the distinction between
(s.[x]).[y]
ands.[x].[y]
by adopting a JSONPath semantics requiring a projection be “captured” by enclosing brackets. This would make the(s.[x]).[y]
form illegal, which would have to be written as[s.[x] ].[y]
. This approach would allow an alternative syntax for appending lists, namely[ x.[_], y.[_] ]
instead ofx + y
. Given that lists can be open or closed, the latter is a bit unclear: do the constraints of remaining elements ofx
get applied toy
or not? What about vice versa? Is the result open or closed? The[ x.[_], y.[_] ]
notation strongly implies the resulting list is closed and that the constraints for remaining elements ofx
andy
do not get applied. To open the list, one would write[ x.[_], y.[_], … ]
. Using+
for list concatenation could then be deprecated.There are more reasons to deviate from JMESPath, though. Although the semantics of
||
and&&
seem fine for a query language, they seem dangerous for a language like CUE. Furthermore, JMESPath provides an expression type, denoted&expr
, that can be used to pass an unevaluated expression to be used as, say, a sort key for a sorting function. The use of&
may be confusing in CUE. Using backticks might be a possibility.Interaction with comprehensions
There is a lot of overlap between comprehensions and the querying syntax. The semantics should subsequently be kept in sync and implementations should implement one in terms of the other. It may be worth considering getting rid of comprehensions, although this seems unfeasible for various reasons.
Value filters
We used boolean expressions here to be consistent with existing query languages. In CUE, however, it would in this case be more natural to say, something like
The boolean expression, however, is more general and a first goal was to be more familiar with existing query languages. Also, the desired semantics using unification is not eminently clear. For instance, in the above example using plain unification would match any value for which
port
could be within that value, so it matches values that do not have port defined.An alternative is to use builtins together with boolean expressions, such as
This would allow for a builtin with a more expected instance relation and would allow for
[?is_a(v1.Service)]
to have the expected meaning.Implementation-wise, boolean expressions are also similar to comprehensions, so focussing on boolean expressions simplifies the implementation initially and allows sharing the spec between them.
Construction syntax
In queries one will often reconstruct a similar value. It is thereby common to replace a field like this:
{ foo: foo }
. In CUE this results in an evaluation cycle. As string labels are not referenceable, a trivial workaround is to write{ "foo": foo }
. As this is a common case, this gets tedious quickly, though. There should probably be a shorthand for this at some point. As{ foo }
already means embedding in CUE, a possibility is{ :foo }
.Using colons
:
versus dots versus spaces for pathsA common complaint for CUE was that using spaces as separators for labels as a shorthand syntax for nested structs was confusing. The most common suggestion to fix it was to use dots. This indeed seems like an obvious choice. The proposed query syntax seems to indicate this even more. Given that a CUE program is structured as a collection of
<node selection> ':' <constraint>
declarations, the query syntax fits in nicely.However, the dot syntax has its own issues. Firsty, CUE allows the RHS to reference the LHS. For instance,
is allowed. This is analogous to Go allowing structs with fields of that struct itself, as long as they are pointers. Given the dot syntax, however, it seems non-obvious that the
b
after the colon ina.b.c: b.d
refers to theb
in left-hand side.Also, the colon in CUE signifies a definition. With the new proposal, a map of map of integers is written intuitively as
Compare this to using the dot syntax, where this would become
which is arguably even more awkward than using spaces as a separator.
Another big advantage of using colons is the ability to alternate regular values and definitions within a single chain. CUE uses both
::
and:
for defining values. In CUE, a definition must always be indicated with a double colon. Using colons as separators allows one to writewhich would not be possible, or at least ambiguous using spaces or dots. The current implementation of CUE disallows such chaining for definitions.
Compared to spaces, using colons (or dots) allows one to break a long chain of declarations across multiple lines. For instance
The comma elision rules prevented that possibility using spaces as separator.
Transition
This proposal is large and needs to be implemented in several stages. Many of the concepts may to existing ones, making its implementation not too hard, but each step will still require thorough revisiting of the proposed syntax and semantics.
Phase 1: Implement backwards incompatible changes.
The first step is characterized by making minimal changes for things we are sure of and to get backwards compatible changes out of the way.
Initially, we intend to only support the square bracket notation and not the parenthesis notation. Using string interpolations gets most of the functionality.
In the very early stages of the transition, we may only support
string
as a valid value. This allows mapping starting early to phase out the current templating notation (<Name>: value
). This means initially only adding the[expr]: value
notation.The aliasing construct makes the recently introduced back-quoted identifiers superfluous. We suggest removing them from the language. Backquoted identifiers are semantically easier, but syntactically relatively verbose to describe.
Phase 2: Optional field selection and associative lists
The ability to apply constraints selectively based on the label value is often requested and moreover is somewhat of a blocker in supporting JSON Schema compatibility. This phase may skip selection on values.
Associative lists also fall in this category of making CUE more useful for its original intent and purposes.
Phase 3: Querying
The query language itself can be implemented in several phases. The first step would be to apply key and value filters to lists and structs. This would also be a good point to add value filters LHS optional fields.
This may also include allowing selection based on any kind of literal.
Phase 4: Deprecating old constructs
This includes:
<Name>: value
)Phase 5: Consider optional parts of this spec
This includes:
(expr): value
notation,...T
), needed for JSON schema compatibility,$
orroot()
), or an identifier for,the current struct, allowing aliasing at the top level,
The text was updated successfully, but these errors were encountered: