Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Change the internal API of operations to support calculations #76828

Closed
wylieconlon opened this issue Sep 4, 2020 · 5 comments
Closed
Assignees
Labels
Feature:Lens Project:LensDefault Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@wylieconlon
Copy link
Contributor

Problem statement

The Lens datasource needs to support calculations which are defined as a function composition with both a query component and a post-processing component in the same editor. The derivative function is composed of an Elasticsearch query and a post-processing component, so it's a calculation. A derivative can be assigned to a visual dimension and the visualization will be able to use the values. Additional metadata might be needed for some calculations, but the public API can already support most of the needs we have. The internal datasource API does not yet support composition.

  1. Break the 1:1 relationship with each operation matching a specific esaggs query.

  2. Allow functions to be composed of other functions, even if the first step is only one level.

  3. Make it easy to extend as we add new types of functions, using reusable components.

Implementation details

Calculations are a new type of operation, which means we have buckets, metrics, and calculations. Calculations are defined as function compositions: they are not atomic. In the short term, we will compose calculations and metrics, but in the future it could go deeper. We should keep separation between the levels of the hierarchy when we can, but sometimes the UX needs will be more important than the separation between levels. I expect that Derivative will be easy to compose from other elements, but Filter Ratio is defined by its special UI which might not be composed the same way. We should support both, but optimize for reuse.

At a conceptual level, we will have "public" and "internal" operations. Public operations are the ones that are used by a visualization. They provide the full editing options. Internal operations are depended on by a calculation, and will have a minimal editor UI that won't offer any output-related features like number formatting or labels. The datasource will help operations manage the internal operations, and will support a user flow like this:

  1. User selects the Sum function, which is a metric
  2. User switches to a calculation like Derivative, and the Sum of Bytes is moved to be an internal operation by reference
  3. User switches back to Sum, and the internal operation is made public again

The first PR will be to write a migration which modifies the state. columns will internally be renamed to operations, and still contain uuid keys which match the keys that the visualization has access to. The current columnOrder will be renamed to operationOrder- it will contain all operations, including internal ones. The public API will filter out internal operations. The order of operations is enforced: buckets first, then metrics, then calculations.

Each layer will add a new property for internalOperations. When a calculation is created, it can also create references to internal operations. The references will either be blank, or pre-filled based on to the requirements of the datasource, but not directly managed by the calculation UI. Calculations will need to provide validation functions to the datasource select the right defaults. These same validation functions are part of the requirements for building a text-based editor, but that's a separate project.

The datasource will be responsible for combining outputs into an expression in basically the same way as it currently does: iterating over the operations in order and generating esaggs output, followed by expression functions that run on the results. Calculations will not provide toEsAggConfigs, but instead provide a new function toExpression. The expression context is the table from esaggs, and it's likely that we will need additional table manipulation helpers. For example, the derivative function needs to know which column is the timeseries column to take a derivative of, as well as any grouping columns.

Other options that weren't proposed

  1. This proposal is using expression functions instead of functions that are available in Elasticsearch. This is because Elasticsearch doesn't offer the same flexibility of post-processing as we could offer in Kibana. For example, we want to support math that operates on calculated numbers, but this is difficult
    in Elasticsearch. Another reason is that this approach gives us fewer dependencies, so it can change in response to user needs.

  2. Instead of adding a reference model, another option was to make all operations atomic and prevent function composition. The way this would work is that the datasource would no longer concatenate all of the data fetching elements, and this responsibility would be pushed into the operations. For example, the Derivative operation would write both the esaggs settings and the post-processing functions, as well as provide all the UI elements for selecting this. There are two main reasons I want to avoid this approach:

  3. It makes the operation switching flow more complicated- we should be able to change functions from public to internal more easily

  4. As we add more functions and combinations, the logic for managing them all is going to get harder to manage. It doesn't seem as scaleable.

@wylieconlon wylieconlon added Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Sep 4, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@wylieconlon
Copy link
Contributor Author

This plan needs to be amended to handle a few cases that have come up in the context of the cumulative sum function. The cases are:

  1. We don't always want users to be aware of references, for example we could build a UI that is based only on a field selector, where the user doesn't need to know that the function changes from "Count of records" to "Sum of field". The previous assumption was that all reference based operations would share a UI, and we need to be more flexible here.

  2. We need a way to restrict some reference-based operations from being used in pie charts, even if the same query is valid for other chart types. The main reason is the "metrics at all levels" concept fails on some types of operations. Specifically, Lens should change the default away from "metrics at all levels", and only enable this feature when it's required and safe to do (pie charts).

  3. Reference based operations are not expected to do in-place editing of column values, so all reference-based operations will have 2 or more columns in the final data table. These intermediate columns need unique IDs.

  4. Suggestion logic needs to get slightly higher level, instead of reading from columns directly, we need to create some simple helper functions that will look at the datasource state and provide a metadata summary of it.

  5. There will be no deduplication of operations vs inner operations: users can have multiple operations that provide the same results.

@wylieconlon
Copy link
Contributor Author

Update on the metrics at all levels parameter, we will stop using it entirely

@wylieconlon
Copy link
Contributor Author

I think we can simplify the original plan here. Instead of creating a new innerOperations property, we can store all operations in the same object. The "inner operations" will still be managed automatically, they just don't need a special place in the state.

@flash1293
Copy link
Contributor

The internal API is changed, it's just not surfaced in the UI yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Lens Project:LensDefault Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

4 participants