[Lens] Change the internal API of operations to support calculations #76828

wylieconlon · 2020-09-04T21:23:22Z

Problem statement

The Lens datasource needs to support calculations which are defined as a function composition with both a query component and a post-processing component in the same editor. The derivative function is composed of an Elasticsearch query and a post-processing component, so it's a calculation. A derivative can be assigned to a visual dimension and the visualization will be able to use the values. Additional metadata might be needed for some calculations, but the public API can already support most of the needs we have. The internal datasource API does not yet support composition.

Break the 1:1 relationship with each operation matching a specific esaggs query.
Allow functions to be composed of other functions, even if the first step is only one level.
Make it easy to extend as we add new types of functions, using reusable components.

Implementation details

Calculations are a new type of operation, which means we have buckets, metrics, and calculations. Calculations are defined as function compositions: they are not atomic. In the short term, we will compose calculations and metrics, but in the future it could go deeper. We should keep separation between the levels of the hierarchy when we can, but sometimes the UX needs will be more important than the separation between levels. I expect that Derivative will be easy to compose from other elements, but Filter Ratio is defined by its special UI which might not be composed the same way. We should support both, but optimize for reuse.

At a conceptual level, we will have "public" and "internal" operations. Public operations are the ones that are used by a visualization. They provide the full editing options. Internal operations are depended on by a calculation, and will have a minimal editor UI that won't offer any output-related features like number formatting or labels. The datasource will help operations manage the internal operations, and will support a user flow like this:

User selects the Sum function, which is a metric
User switches to a calculation like Derivative, and the Sum of Bytes is moved to be an internal operation by reference
User switches back to Sum, and the internal operation is made public again

The first PR will be to write a migration which modifies the state. columns will internally be renamed to operations, and still contain uuid keys which match the keys that the visualization has access to. The current columnOrder will be renamed to operationOrder- it will contain all operations, including internal ones. The public API will filter out internal operations. The order of operations is enforced: buckets first, then metrics, then calculations.

Each layer will add a new property for internalOperations. When a calculation is created, it can also create references to internal operations. The references will either be blank, or pre-filled based on to the requirements of the datasource, but not directly managed by the calculation UI. Calculations will need to provide validation functions to the datasource select the right defaults. These same validation functions are part of the requirements for building a text-based editor, but that's a separate project.

The datasource will be responsible for combining outputs into an expression in basically the same way as it currently does: iterating over the operations in order and generating esaggs output, followed by expression functions that run on the results. Calculations will not provide toEsAggConfigs, but instead provide a new function toExpression. The expression context is the table from esaggs, and it's likely that we will need additional table manipulation helpers. For example, the derivative function needs to know which column is the timeseries column to take a derivative of, as well as any grouping columns.

Other options that weren't proposed

This proposal is using expression functions instead of functions that are available in Elasticsearch. This is because Elasticsearch doesn't offer the same flexibility of post-processing as we could offer in Kibana. For example, we want to support math that operates on calculated numbers, but this is difficult
in Elasticsearch. Another reason is that this approach gives us fewer dependencies, so it can change in response to user needs.
Instead of adding a reference model, another option was to make all operations atomic and prevent function composition. The way this would work is that the datasource would no longer concatenate all of the data fetching elements, and this responsibility would be pushed into the operations. For example, the Derivative operation would write both the esaggs settings and the post-processing functions, as well as provide all the UI elements for selecting this. There are two main reasons I want to avoid this approach:
It makes the operation switching flow more complicated- we should be able to change functions from public to internal more easily
As we add more functions and combinations, the logic for managing them all is going to get harder to manage. It doesn't seem as scaleable.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-09-04T21:23:25Z

Pinging @elastic/kibana-app (Team:KibanaApp)

wylieconlon · 2020-10-19T21:52:07Z

This plan needs to be amended to handle a few cases that have come up in the context of the cumulative sum function. The cases are:

We don't always want users to be aware of references, for example we could build a UI that is based only on a field selector, where the user doesn't need to know that the function changes from "Count of records" to "Sum of field". The previous assumption was that all reference based operations would share a UI, and we need to be more flexible here.
We need a way to restrict some reference-based operations from being used in pie charts, even if the same query is valid for other chart types. The main reason is the "metrics at all levels" concept fails on some types of operations. Specifically, Lens should change the default away from "metrics at all levels", and only enable this feature when it's required and safe to do (pie charts).
Reference based operations are not expected to do in-place editing of column values, so all reference-based operations will have 2 or more columns in the final data table. These intermediate columns need unique IDs.
Suggestion logic needs to get slightly higher level, instead of reading from columns directly, we need to create some simple helper functions that will look at the datasource state and provide a metadata summary of it.
There will be no deduplication of operations vs inner operations: users can have multiple operations that provide the same results.

wylieconlon · 2020-10-22T19:26:25Z

Update on the metrics at all levels parameter, we will stop using it entirely

wylieconlon · 2020-11-04T19:51:21Z

I think we can simplify the original plan here. Instead of creating a new innerOperations property, we can store all operations in the same object. The "inner operations" will still be managed automatically, they just don't need a special place in the state.

flash1293 · 2020-12-07T09:32:03Z

The internal API is changed, it's just not surfaced in the UI yet

wylieconlon added Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Sep 4, 2020

This was referenced Sep 4, 2020

[Lens] User experience of composing aggregations and calculations #69215

Closed

[Lens] Architectural requirements for calculations #68460

Closed

timroes mentioned this issue Sep 7, 2020

[Meta][Lens] Data Modelling #57708

Closed

timductive mentioned this issue Sep 8, 2020

[Meta][Lens] Lens by Default #74685

Closed

26 tasks

stacey-gammon added the Project:LensDefault label Sep 16, 2020

wylieconlon mentioned this issue Sep 22, 2020

[Lens] Fieldless operations #78080

Merged

1 task

wylieconlon self-assigned this Oct 19, 2020

This was referenced Oct 20, 2020

[Lens] Add cumulative sum aggregation #61776

Closed

[Lens] Add derivative function #61775

Closed

flash1293 closed this as completed Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Lens] Change the internal API of operations to support calculations #76828

[Lens] Change the internal API of operations to support calculations #76828

wylieconlon commented Sep 4, 2020

elasticmachine commented Sep 4, 2020

wylieconlon commented Oct 19, 2020

wylieconlon commented Oct 22, 2020

wylieconlon commented Nov 4, 2020

flash1293 commented Dec 7, 2020

[Lens] Change the internal API of operations to support calculations #76828

[Lens] Change the internal API of operations to support calculations #76828

Comments

wylieconlon commented Sep 4, 2020

Problem statement

Implementation details

Other options that weren't proposed

elasticmachine commented Sep 4, 2020

wylieconlon commented Oct 19, 2020

wylieconlon commented Oct 22, 2020

wylieconlon commented Nov 4, 2020

flash1293 commented Dec 7, 2020