Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design doc for processing query language. #4444

Merged
merged 4 commits into from
Jan 27, 2022

Conversation

anuraaga
Copy link
Contributor

This captures ongoing discussions about a revamp of the processing pipeline.

@bogdandrutu @punya

@anuraaga anuraaga requested review from a team and codeboten November 17, 2021 02:39
@anuraaga
Copy link
Contributor Author

For a reminder, notes from an initial review - I meant to mostly capture them but if missed anything feel free to point out!

type = “span” -> from


calling out labels to understand output more clearly

no type = resource

ilm

angularjs scope


basic queries hundreds of nanoseconds

transform

processing

point out it’s being implemented in contrib

delta / cumulative - processing

processing can be embedded

can have field name conflicts with OTLP

always prefix with scope

e.g., type = span and type = metric

pod.cpu.usage - multiple within same metric

drop attribute, then aggregate

collection of aligned points

array-like field

type

exists attribute

dan observiq logs + stanza

order of execution

losing optimizations because of side effects

FROM resource prevents the resource-splitting case

Clarify that operations are not independent

Keep cross-signal for farther down the line

keep as parameter to create_histogram

batchbyresource

@codecov
Copy link

codecov bot commented Nov 17, 2021

Codecov Report

Merging #4444 (9d42af4) into main (db4aa87) will increase coverage by 0.03%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4444      +/-   ##
==========================================
+ Coverage   90.69%   90.72%   +0.03%     
==========================================
  Files         178      179       +1     
  Lines       10356    10690     +334     
==========================================
+ Hits         9392     9699     +307     
- Misses        746      770      +24     
- Partials      218      221       +3     
Impacted Files Coverage Δ
exporter/loggingexporter/known_sync_error.go 0.00% <0.00%> (-80.00%) ⬇️
service/command.go 82.35% <0.00%> (-17.65%) ⬇️
config/configtelemetry/configtelemetry.go 89.74% <0.00%> (-10.26%) ⬇️
cmd/builder/internal/builder/main.go 59.74% <0.00%> (-7.29%) ⬇️
service/flags.go 86.36% <0.00%> (-6.82%) ⬇️
exporter/loggingexporter/logging_exporter.go 88.04% <0.00%> (-5.44%) ⬇️
internal/cgroups/cgroups.go 87.67% <0.00%> (-4.00%) ⬇️
processor/memorylimiterprocessor/memorylimiter.go 85.32% <0.00%> (-3.27%) ⬇️
service/internal/telemetrylogs/logger.go 85.71% <0.00%> (-1.79%) ⬇️
config/configgrpc/configgrpc.go 92.57% <0.00%> (-1.35%) ⬇️
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db4aa87...9d42af4. Read the comment docs.

@alolita
Copy link
Member

alolita commented Nov 17, 2021

Thanks @anuraaga

Adding to Collector SIG meeting agenda for discussion.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading again the examples I fear that we try too much to reduce the number of statements/words and because of that we need to make some "assumptions"/"tradeoffs". I would suggest to avoid any kind of optimizations like that in the first version and ask users to always fully specify (sometimes duplicate) the statements. Then we can start adding "optimizations" like if you don't have a from applies to all etc.

mechanism could be used for selecting metrics for temporality conversion as other cases, but it is expected that in
practice configuration will be limited.

The processors implementing this use case are `cumulativetodeltaprocessor` and `deltatorateprocessor`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deltatorateprocessor is not temporality related :D the name is a bit wrong to include "delta", it is just a rate calculation value/time_interval, this is more meaningful for delta I agree, but the operation can be done on cumulative as well.

docs/processing.md Show resolved Hide resolved
Comment on lines 130 to 132
Remove a forbidden attribute such as `http.request.header.authorization` from all telemetry.

`delete(attributes["http.request.header.authorization"])`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is nice, but a bit confusing. Does this remove the attribute from the "span.events" as well? What about the "resource.attributes"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, as on line 119 I defined all field accesses as fully specified, so it is not ambiguous I think. Let me know if I should phrase it differently


Remove all attributes except for some

`keep(attributes, "http.method", "http.status_code") from metric`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call the "signals" as [traces/metrics/logs] may want to be consistent here and use plural at least, and probably use "traces" instead of "spans"?

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 26, 2021
@anuraaga
Copy link
Contributor Author

anuraaga commented Nov 29, 2021

Thanks @bogdandrutu - sorry for the delay, I descoped things like "from all telemetry" since as you mentioned before, the collector has no support for this at all anyways. And the paths are absolute paths as defined here, hopefully that's the reduced scope you're talking about.

Also have started on a very hacky prototype of the AST processing

anuraaga/opentelemetry-collector-contrib#1

If anyone has any suggestions on test cases to solve / benchmark in the first versions, I can list them up in the doc

@github-actions github-actions bot removed the Stale label Nov 30, 2021
@tigrannajaryan
Copy link
Member

@anuraaga this is very interesting. I have been thinking about a query language in the background for a while, would be good to discuss this. Before you go ahead with the current design let's discuss the alternates and possible amendments. A couple thing I would be interested to explore:

SQL or No

Is SQL-like language preferable or we can go with imperative style (e.g. more Python-like)? It doesn't seem like we are going to support complicated SQL-like facilities (e.g. joins) and SQL's verbosity seems superfluous.

Compare SQL-like vs Python-like below.

Example 1:
delete(attributes["http.request.header.authorization"]) from traces where name = "GET /cats"
with:
if name = "GET /cats": delete(attributes["http.request.header.authorization"])

Example 2:
set(attributes["k8s_pod"], resource.attributes["k8s.pod.name"]) from metrics
with:
attributes["k8s_pod"] = resource.attributes["k8s.pod.name"]

The from clause seems unnecessary since the input to the processor is a known data type.

Special syntax/lexical rules

Can we define the syntax such that it makes usage with Otel data specifically convenient? You mentioned a shorthand for accessing scoped data. This is the right idea and I think we can make spill/populate all "attributes" as top-level variables. Maybe we can also combine this will tailored lexical rules for identifiers that allows dot as a valid identifier character and then we can do this instead of the above:

Example 1:
if name = "GET /cats": delete(http.request.header.authorization)
Example 2:
k8s_pod = resource.attributes["k8s.pod.name"]

The idea being that the most common cases are much more concise and pleasant to write. The downside is you loose the dot as an operator, but maybe we don't need it and brackets are enough for traversal of objects.

Comment on lines 167 to 168
create_histogram("duration", end_time_nanos - start_time_nanos) from traces
keep(attributes, "http.method") from metrics where descriptor.metric_name = "duration"
Copy link
Member

@tigrannajaryan tigrannajaryan Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what this does? Does this read a span and produce a metric? How is this possible with current processors interface where the processor's input/output data type is the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I had meant to remove these cases but forgot this one. When originally writing the doc I forgot about the one signal-per-pipeline limitation.

In the future, I think shared pipelines could be cool but not for now, especially because collector core has no way of supporting that

Create utilization metric from base metrics. Because navigation expressions only operate on a single piece of telemetry,
helper functions for reading values from other metrics need to be provided.

`create_gauge("pod.cpu.utilized", read_gauge("pod.cpu.usage") / read_gauge("node.cpu.limit") from metric`**s**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this possible the processor will need to store metric data points referenced by read_gauge, right? Would that be the responsibility of read_gauge implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's the idea, though need to confirm better with code indeed

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this pull request Dec 1, 2021
This builds on top of open-telemetry#4444
but uses Python-like syntax instead of SQL-like.
@tigrannajaryan
Copy link
Member

For easier comparison I posted the alternate with Python-like syntax here: #4499

Copy link
Contributor Author

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments @tigrannajaryan!

One of the main motivations of SQL-like was to take advantage of existing tooling or muscle memory. SQL itself has a grammar that is pretty general, I think the proposed syntax is actually a true subset of it. We probably don't want to provide a real programming language in these statements (or at least not yet :) ), so worried if the Python-derived syntax may end up too flexible. We also need to make sure to keep open query engine optimizations, e.g. combining transformations with the same where clause. Of course, if it's just a difference in formatting, then if: and where could be implemented similarly, but the former seems like it could open a larger can of worms.

The from clause mostly comes from my original idea of having a pipeline with all the signals, forgetting this isn't possible. However, I think having it even from the beginning may be worth it to allow for this in the future, and even in an initial rendition, it seems like UX can be simpler by only configuring the processor once and applying it to each signal pipeline - the processor would smartly only use the rules that actually make sense.

Let me know what you think of these thoughts. /cc @punya also who has been a proponent of SQL-like syntax I believe

Comment on lines 167 to 168
create_histogram("duration", end_time_nanos - start_time_nanos) from traces
keep(attributes, "http.method") from metrics where descriptor.metric_name = "duration"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I had meant to remove these cases but forgot this one. When originally writing the doc I forgot about the one signal-per-pipeline limitation.

In the future, I think shared pipelines could be cool but not for now, especially because collector core has no way of supporting that

Create utilization metric from base metrics. Because navigation expressions only operate on a single piece of telemetry,
helper functions for reading values from other metrics need to be provided.

`create_gauge("pod.cpu.utilized", read_gauge("pod.cpu.usage") / read_gauge("node.cpu.limit") from metric`**s**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's the idea, though need to confirm better with code indeed

@tigrannajaryan
Copy link
Member

@anuraaga I am not opposed to SQL-like but I think it may be worth to spend a bit more time with exploring what can give us best usability. I am not completely sure SQL is the most familiar language for our target audience.

What do you think if we put together a side-by-side comparison of the examples that you have, plus some other interesting use-cases and see what it looks like? I had also looked into SPL (Splunk's query language) as a source of inspiration, so it may be useful to see if we can borrow any useful ideas from there.

If we are totally brave it may even make sense to compare 3 possibilities side-by-side: SQL-like, Python-like and homegrown.

I am happy to help/work on this together if you want.

@anuraaga
Copy link
Contributor Author

anuraaga commented Dec 3, 2021

@tigrannajaryan That sounds like a good idea, let me try to merge the snippets from your PR to let us see things side-by-side

@anuraaga
Copy link
Contributor Author

anuraaga commented Dec 9, 2021

@tigrannajaryan Sorry for the delay I added snippets following your form as well how does it look? Also feel free to make edits directly to the branch if any easy improvements

@tigrannajaryan
Copy link
Member

@tigrannajaryan Sorry for the delay I added snippets following your form as well how does it look? Also feel free to make edits directly to the branch if any easy improvements

Thanks @anuraaga . I think we need to compare and see what we gain by using one or the other approach.

Some arguments in favour of SQL-like:

  • Because you can have a from clause you can use the same expression in a processor attached to pipelines of multiple data types. I am not entirely sure this is a benefit though. I could argue the other way and say that if you attach to a pipeline of a wrong type it may be confusing why the expression has no effect.

Arguments in favour of Python-like:

  • Slightly more concise (less characters).

To comments on some arguments you brought up earlier:

One of the main motivations of SQL-like was to take advantage of existing tooling or muscle memory.

I think it is arguable whether SQL or Python has more tooling or muscle memory. Besides it is not clear what tooling we can reuse. Editors with syntax highlighting perhaps, provided that the expression is in a separate file (not possible today but maybe in the future)?

We probably don't want to provide a real programming language in these statements (or at least not yet :) ), so worried if the Python-derived syntax may end up too flexible.

I agree that we need to be careful with this. However given that both with SQL-like and Python-like we are only providing a severely limited subset of the language I think we are going to be exposing roughly the same level of complexity, so I think it is a wash here.

We also need to make sure to keep open query engine optimizations, e.g. combining transformations with the same where clause.

I would like to explore this more. Do we believe that SQL syntax is inherently easier to optimize for our use cases? I am not sure why is it so. The optimizations SQL engines do are typically all about execution plans. However, in our case we have a stream of data, not data at rest, which inherently limits what sort variations you may have in the "execution plan". I doubt that a lot of optimizations are possible, in the majority of case we are going to be literally executing the statements as is, while iterating over the data records. I would like to see a specific example where we think a particular SQL syntax is easier to optimize that the equivalent Python syntax. I am having a bit of a hard time coming up with an example myself.

This is probably not very helpful. I feel like there is not enough arguments in favour or against one or the other approach yet. Let me think a bit to see if I can come up with some additional arguments and please feel free to add some yourself.

Some other topics to try to explore:

  • Is there perhaps an extended set of operations which we will want in the future and which are much better/easier to express in one syntax but not the other?
  • How does end-user troubleshooting compare for the 2 syntaxes? Can we emit much better error message for one syntax but not the other?
  • Can we provide good tooling in the future? We could potentially implement Language Server Protocol and Debug Adapter Protocol. Is one syntax or the other more favourable from this perspective?

@anuraaga
Copy link
Contributor Author

Thanks @tigrannajaryan - I agree that the differences aren't that large and there isn't any clear winner.

I could argue the other way and say that if you attach to a pipeline of a wrong type it may be confusing why the expression has no effect.

I guess I am thinking in terms of forwards compatibility, which may be too much early optimization anyways. I do expect a way to process multiple signals at the same time to be useful in the future, and it would be nice if the syntax supports it. The syntax could change to support it in the future though.

I am having a bit of a hard time coming up with an example myself.

It's true that this probably ties into the previous section, if the language exposes too much complexity then it would become harder to optimize. But if the complexity is kept down for even a python-like language, then the optimization possibilities should be similar. They would both probably be frontends into the same AST backend that actually runs the commands.

I guess from reading them, and maybe just personal preference, the SQL form seems to map more precisely to single statements. if foo: do feels like that do should be able to be as many statements as needed, on the flip side I think supporting multiple commands for a single where would look awkward I think. Whether multiple statements for a condition should be supported in the language or not could be a decision point - the simplicity of single statements appeals to me in some sense, but in practice there may be too many situations where multiple commands are useful that they should be modeled into the language.

@tigrannajaryan
Copy link
Member

I guess from reading them, and maybe just personal preference, the SQL form seems to map more precisely to single statements. if foo: do feels like that do should be able to be as many statements as needed, on the flip side I think supporting multiple commands for a single where would look awkward I think. Whether multiple statements for a condition should be supported in the language or not could be a decision point - the simplicity of single statements appeals to me in some sense, but in practice there may be too many situations where multiple commands are useful that they should be modeled into the language.

You are right, potentially with python like syntax we can allow:

if condition:
  statement1
  statement2
  ....

@tigrannajaryan
Copy link
Member

I guess I am thinking in terms of forwards compatibility, which may be too much early optimization anyways. I do expect a way to process multiple signals at the same time to be useful in the future, and it would be nice if the syntax supports it. The syntax could change to support it in the future though.

@anuraaga Can you perhaps show and example of what you mean by this? I feel that I may be missing some nice way to support multiple signals that you are envisioning.

@anuraaga
Copy link
Contributor Author

anuraaga commented Dec 13, 2021

Can you perhaps show and example of what you mean by this? I feel that I may be missing some nice way to support multiple signals that you are envisioning.

@tigrannajaryan "A way to support multiple signals"? Nah I don't have that :P Would be a large rewrite in the pipeline I think. I do have a desire though. I think there are many cross-signal use cases like "Drop all health check telemetry", "Redact auth header on all telemetry", "Reduce cardinality for all telemetry of certain attribute in same way". So I am hoping that at least a future configuration language allows this.

Ah, if you meant syntax-wise, then I was thinking

delete(attributes["http.request.header.authorization"]) from traces where name = "GET /cats"
delete(attributes["http.request.header.authorization"]) where name = "GET /cats"

Former is only traces, latter is for all telemetry. Explicitly requiring the signal name in the statement means in the later it could be omitted when applying to all telemetry. Even now it could be, the same instantiated transform processor could be applied to all signals.

@tigrannajaryan
Copy link
Member

Former is only traces, latter is for all telemetry. Explicitly requiring the signal name in the statement means in the later it could be omitted when applying to all telemetry. Even now it could be, the same instantiated transform processor could be applied to all signals.

I see. You can achieve it today by either including the processor in a particular pipeline or not. This does not seem to enable any new interesting functionality that is not possible with the current philosophy of using a processor in one or more pipelines as applicable.

I was wondering if it could somehow enable more interesting scenarios where signal type is getting converted in the middle of the pipeline and somehow one pipeline temporarily happens to hold more than one signal type and on which you execute statements selectively using the "from" statement. This would essentially mean a pipeline can contain a data of multiple signals. I can see how in that case the "from" statement would be really necessary. However, I am having a hard time imagining how exactly we could change the Collector's pipelines to work like that.

Anyway, I was just trying to find stronger arguments in favour of having the "from" clause, but I do not seem to be able :-)

@alolita
Copy link
Member

alolita commented Dec 16, 2021

@punya can you review the list of examples and add any missing ones.

// Assuming this is not in "core"
processors.register("replace_wildcards", replace_wildcards)

func replace_wildcards(pattern regexp.Regexp, replacement string, path processors.TelemetryPath) processors.Result {
Copy link
Member

@tigrannajaryan tigrannajaryan Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the string in the call get converted to a regexp.Regexp here? The example above shows:
replace_wildcards("/user/*/list/*", "/user/{userId}/list/{listId}", attributes["http.target"])
Is reflection used for every invocation of of replace_wildcards to figure out that "/user/*/list/*" must be compiled into regexp.Regexp? Or is it the responsibility of the compiler to figure this out?

Related to this: do we adopt weak typing, when you can pass a string when regexp is expected? Weak typing may be more complicated to optimize (i.e. precompile regexp patterns once) so it may be worth thinking about this aspect of the language.

Do we intend to make the compiler smart enough to figure out that the regexp compilation can be done ahead of time and not for every invocation for every span? That's a fairly big ask for a simpler compiler, but without that the execution can be quite slow.

Copy link
Contributor Author

@anuraaga anuraaga Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great callout - while writing the POC I realized the functions need to be factories of the actual logic function.

(a bit of shorthand)

func replace_wildcards(string, string) func(Span, Resource) {
  r := regex.MustCompile
  return func(span, resource) {
    return doReplace(r, s, span, resource)
  }
}

func doReplace(regex, string, span, resource) {
  go()!
}

The UX for defining a function is a bit reduced but seems quite reasonable. And with this it will be possible to use reflection to convert types passed into the factory so instread of string, it could be defined to accept regex. The framework would only need to reflect, convert regex, and invoke factory, once during config parse time.

I'll add a note in the doc about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is it the same smartness you're referring to? 😅 I think it achieves some balance by creating the factory / logic function split to make the optimization easier without making the compiler have to be too complex.

Copy link
Member

@tigrannajaryan tigrannajaryan Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The factory approach requires that some arguments are compile time constants. We loose the ability to use a value computed at runtime as the regular expression, for example. It also requires us to declare which of the function parameters are compile-time parameters and which are runtime parameters. In this example the fist 2 arguments of replace_wildcards are compile-time constants.

What if I want to replace a span attribute by a value of another attribute and not by a constant string? It doesn't seem to be possible. To be honest, I have reservations about the approach you suggest. I think it limits expressiveness of the language. Implementation-wise this approach is deviating from how language compilers and VMs are typically implemented and it may be difficult to fix it without major rewrites in the future.

Alternative approach is to have Regexp type as a first class citizen in the language. So you can do this instead:
replace_wildcards(regexp("/user/*/list/*"), "/user/{userId}/list/{listId}", attributes["http.target"]) where regexp is a function that takes a string and returns a Regexp. Then a sufficiently smart compiler can perform the Regexp value computation at compile time and avoid calling the regexp() function at time if the argument to regexp() is a compile-time constant. See how our current expr evaluator added support for that: https://github.com/antonmedv/expr/pull/96/files

We have another unclear moment with the 3rd parameter of replace_wildcards. It is not clear how attributes["http.target"] becomes a TelemetryPath (or a Span and a Resource pair that is passed to doReplace). Is the semantic of the language that attributes["http.target"] is an lvalue of type TelemetryPath that can be either evaluated to get the value of or can be passed as a reference to a function to store a value into?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The factory approach requires that some arguments are compile time constants. We loose the ability to use a value computed at runtime as the regular expression, for example.

Could you describe an example of a regex computed at runtime? As this is a static config (well possibly dynamically updated via a remote), I figured that by nature the config elements, such as a regex, would be static.

is an lvalue of type TelemetryPath that can be either evaluated to get the value of or can be passed as a reference to a function to store a value into?

Yeah - since it's not a string, it is parsed as a telemetry path. If the function's argument in that position does not accept a telemetry path, it would be a configuration error and cause a warning. It's true that currently only telemetry paths are runtime values - I think this hits all of the use cases for telemetry transform (the only runtime data is the telemetry itself) but let me know what other ones there are that may need something more.

Currently the value type is a literal, path expression, or invocation of a function

https://github.com/anuraaga/opentelemetry-collector-contrib/pull/1/files#diff-3dcf16cd5cabbe637313a07c03600765fd4b043e429d66b9ef024f62f3adbd51R24

It would be relatively easy to expand support for special syntax, e.g. /regex/ or even regexp("foo") if it's useful I think. That being said, I don't think we'd avoid factories (though they wouldn't necessarily need to be required). We probably can't support all types of config-time resolved values. For keep, we need to convert a slice of strings into a map for quick presence tests

https://github.com/anuraaga/opentelemetry-collector-contrib/pull/1/files#diff-d38eee897f4ad52c000a3df17502753a839f5afb2f7bdf3f573807cea2588561R26

There doesn't seem to be a way to handle this generically, but the factory approach seems to work well for it.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this as draft. After we progress with the implementation we may update it.

@bogdandrutu bogdandrutu enabled auto-merge (squash) January 25, 2022 19:23
@tigrannajaryan
Copy link
Member

@anuraaga I found a couple interesting alternate approaches that we could probably look into:
https://github.com/max-sixty/prql
https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/samples?pivots=azuredataexplorer

I think the syntax of either of these is nicer than SQL. I like that the order of syntax elements matches the order of logical execution.

@bogdandrutu bogdandrutu disabled auto-merge January 25, 2022 19:56
@github-actions github-actions bot removed the Stale label Jan 26, 2022
@anuraaga
Copy link
Contributor Author

@tigrannajaryan Thanks a lot for the pointers! I will look through those to see what we can take out of them. Do you think we should continue to iterate on this PR or followup with another one?

@tigrannajaryan
Copy link
Member

Do you think we should continue to iterate on this PR or followup with another one?

Works either way for me. Whatever you prefer.

@anuraaga
Copy link
Contributor Author

This has been out for a while so I think it'll be good to get this in as a draft and iterate on this / the initial implementation together if it's ok

@tigrannajaryan tigrannajaryan added Skip Changelog PRs that do not require a CHANGELOG.md entry discussion-needed Community discussion needed and removed discussion-needed Community discussion needed labels Jan 27, 2022
@tigrannajaryan tigrannajaryan merged commit 32301d9 into open-telemetry:main Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-needed Community discussion needed Skip Changelog PRs that do not require a CHANGELOG.md entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants