Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data.search] Clean up arguments to esaggs. #84973

Merged
merged 17 commits into from
Dec 10, 2020

Conversation

lukeelmers
Copy link
Member

@lukeelmers lukeelmers commented Dec 4, 2020

Closes #61768
Closes #65067
Part of #46902

Summary

This updates the arguments of esaggs a part of the project to clean up semantics across all expression functions by removing blobs of stringified json. The following changes have been made here:

  1. Updates the index arg in esaggs to require an argument of type index_pattern which is only returnable by indexPatternLoad
  2. Updates the aggConfigs arg to be an aggs multiarg, which accepts type agg_type which is only returnable by each of the various agg type functions
  3. Updates downstream usage of esaggs in both Lens & Visualize to ensure the function is called correctly
    • In Lens, updated the toEsAggsConfig datasource method to be toEsAggsFn and return the function AST. This is basically what @wylieconlon did with his exploration in [Lens] Use agg functions instead of string #77565
    • In Visualize, updated build_pipeline as well as each of the vis types that already had their own toExpressionAst defined
  4. Cleans up a few bugs that were uncovered along the way in some of the agg type functions & in aggConfig.toExpressionAst()
  5. Cleans up a bug I found in visualize which appears to be a regression introduced in Vislib visualization renderer #80744 (not yet released).
    • The problem was with the serial diff agg, which is currently broken on master.
  6. Updates the esaggs implementation & tests
  7. Adds unit tests for esaggs & requestHandler.
  8. Adds functional tests to the interpreter functional test suite for server-run expressions.

Testing

This PR introduces no functional changes, however it makes significant changes to the underlying infrastructure for generating aggregation requests that are run for Visualize & Lens. The best things to test are various configurations of agg types in different visualizations to make sure things are behaving as expected. Ultimately no user-facing behavior should change as a result of this PR.

Plugin API Changes

The esaggs expression function, which is the default method of requesting aggregations for visualizations, has had some changes to the arguments it accepts.

// old
esaggs index="logstash-*" aggConfigs="[{ id: 1, enabled: true, type: "count", schema: "metric" }]"

// new
esaggs
  // use indexPatternLoad and pass your ID instead of passing it as a string
  index={indexPatternLoad id="logstash-*"}
  // use aggType functions for each aggregation you need. the aggs argument
  // can be passed multiple times. if you are using AggConfigs you can automatically
  // generate the expression AST for these arguments with `aggConfig.toExpressionAst()`
  aggs={aggCount id=1 enabled=true schema="metric"} 

@lukeelmers lukeelmers added the WIP Work in progress label Dec 4, 2020
schema: 'metric',
params: {
toEsAggsFn: (column, columnId) => {
return buildExpressionFunction<AggFunctionsMapping['aggCardinality']>('aggCardinality', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks great to me, seems like that's just as maintainable as the old version 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like that's just as maintainable as the old version

Yeah after our discussion I realized that, once converting to use the function builder, you gain a lot more type safety than you had previously, so overall I think that will help prevent any future issues 🙂

@lukeelmers lukeelmers force-pushed the feat/esaggs-args branch 3 times, most recently from ec49568 to efb40b0 Compare December 7, 2020 04:26
Comment on lines +712 to +724
it('does not stringify arrays which are not objects', () => {
const ac = new AggConfigs(indexPattern, [], { typesRegistry });
const configStates = {
type: 'percentiles',
params: {
field: 'bytes',
percents: [1, 25, 50, 75, 99],
},
};
const aggConfig = ac.createAggConfig(configStates);
const percents = aggConfig.toExpressionAst()?.chain[0].arguments.percents;
expect(percents).toEqual([1, 25, 50, 75, 99]);
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncovered a bug in toExpressionAst where the percents arg was incorrectly stringified when we should have preserved it, so I updated the logic to handle this case where a param is already an array, but not an object array.

*/
toExpressionAst(): ExpressionAstFunction | undefined {
toExpressionAst(): ExpressionAstExpression | undefined {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discovered that returning an ExpressionAstFunction was not very useful since using this as a subexpression requires it to be ExpressionAstExpression, so I updated accordingly.

import { getMovingAvgMetricAgg } from './metrics/moving_avg';
import { getSerialDiffMetricAgg } from './metrics/serial_diff';
import * as buckets from './buckets';
import * as metrics from './metrics';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes here, this file was just cleanup.

@@ -72,7 +72,7 @@ export const aggHistogram = (): FunctionDefinition => ({
}),
},
interval: {
types: ['string'],
types: ['number', 'string'],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had incorrectly typed this as string only (since you can do auto), but in most cases the interval is actually a number when explicitly provided. Keeping this as a string caused some functional test failures where an incorrect interval got applied when the histogram expression function cast the provided number to a string instead of keeping its type.

Comment on lines +127 to +131
let calculatedInterval = isAuto
? 0
: typeof interval !== 'number'
? parseFloat(interval)
: interval;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to tweak logic here to make TS happy once updating interval to be string or number. Now we need to check for the case where it's already a number.

Technically this doesn't change anything at runtime (since passing a number to parseFloat will still return the same number), but TS expects parseFloat to only accept a number.

const aggConfigsState = JSON.parse(args.aggConfigs);
const indexPattern = await indexPatterns.get(args.index);
const aggConfigs = aggs.createAggConfigs(indexPattern, aggConfigsState);
const indexPattern = await indexPatterns.create(args.index.value, true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used indexPatterns.create here, because manually creating new IndexPattern() still requires you to provide the field formats service, plus some UI settings... which means adding more dependencies to esaggs (currently ui settings is not required).

However we could still refactor to use new IndexPattern() directly if desired; this was just a simpler change.

Comment on lines -91 to -96
const configStr = JSON.stringify(visConfig).replace(/\\/g, `\\\\`).replace(/'/g, `\\'`);
const visTypeXy = buildExpressionFunction<VisTypeVislibExpressionFunctionDefinition>(
vislibVisName,
{
type: vis.type.name,
visConfig: configStr,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a regression here which AFAICT was introduced by #80744.

It surfaced as a bug where creating a vislib vis with a serial diff agg threw an error... Because the custom label that got created had incorrectly double-escaped backslashes, e.g.: "Blah blah serial diff of \\"bytes\\""

This would cause JSON.parse to throw because the double-escaped quotes were accidentally kept in place and interpreted as part of the json, unexpectly ending the line early.

I know this escaping was copied over from build_pipeline and TBH I'm not 100% sure why it isn't needed anymore, but thought I'd provide some background. I only found this bug because I was testing visualizations using every agg type to validate this PR, and ran into trouble with serial diff.

Also worth pointing out that the to_ast for pie charts still has the escaping in place, because i couldn't find a situation where it was broken. Might be worth testing that further to verify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested both and I think that they both work as they should. I tested all the aggs on all vislib visualizations. Can't find any regressions.

@@ -4,7 +4,6 @@
* you may not use this file except in compliance with the Elastic License.
*/

import { ExpressionFunctionAST } from '@kbn/interpreter/common';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside -- At some point y'all should consider removing all of the old @kbn/interpreter imports from Lens since you can access the same types (with slightly different names) from expressions.

I just did so in the files I touched here

field: column.sourceField,
missing: 0,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe missing is a valid param for any of these agg types so I removed it; please LMK if you know otherwise.

maxBars: params.maxBars === AUTO_BARS ? undefined : params.maxBars,
interval: 'auto',
has_extended_bounds: false,
min_doc_count: true,
Copy link
Member Author

@lukeelmers lukeelmers Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Please confirm that the update I made to min_doc_count doesn't have any unexpected effects on Lens, because I think it might actually be changing the behavior...

According to the histogram agg type, min_doc_count should be a boolean (not a number). It writes out the param as follows:

         if (aggConfig.params.min_doc_count) {
            output.params.min_doc_count = 0;
          } else {
            output.params.min_doc_count = 1;
          }

So as you can see, setting it to true will return a value of 0, which I why I changed it to true here as it was previously set to 0.

However, since the check is just confirming whether min_doc_count is truthy, 0 might instead cause it to actually be set to 1 (or rather, the expressions interpreter could be casting it to false). So it's possible that lens was inadvertently relying on this behavior, which is why I'd appreciate a second set of eyes on this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, yes we are relying on this behavior, could you flip the boolean?

Otherwise histogram is hard to use on a table:

Screenshot 2020-12-08 at 14 03 28

Screenshot 2020-12-08 at 14 03 34

@lukeelmers lukeelmers added Feature:Aggregations Aggregation infrastructure (AggConfig, esaggs, ...) Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) Feature:Search Querying infrastructure in Kibana release_note:plugin_api_changes Contains a Plugin API changes section for the breaking plugin API changes section. Team:AppServices v7.11.0 v8.0.0 labels Dec 7, 2020
@lukeelmers lukeelmers self-assigned this Dec 7, 2020
@lukeelmers lukeelmers added review and removed WIP Work in progress labels Dec 7, 2020
@lukeelmers lukeelmers marked this pull request as ready for review December 7, 2020 04:59
@lukeelmers lukeelmers requested a review from a team December 7, 2020 04:59
@lukeelmers lukeelmers requested a review from a team as a code owner December 7, 2020 04:59
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServices)

@botelastic botelastic bot added the Feature:Embedding Embedding content via iFrame label Dec 8, 2020
@lukeelmers lukeelmers force-pushed the feat/esaggs-args branch 2 times, most recently from a668d50 to bf57656 Compare December 9, 2020 00:00
Comment on lines +50 to +57
const getNewAbortController = (): AbortController => {
try {
return new AbortController();
} catch (error) {
// eslint-disable-next-line @typescript-eslint/no-var-requires
const polyfill = require('abortcontroller-polyfill/dist/cjs-ponyfill');
return new polyfill.AbortController();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug uncovered by functional tests which we hadn't found yet: Since AbortController isn't supported till Node 15 (and we are currently on v14), we need to manually apply a polyfill on the server.

Since this code is in common, I'm hacking around the polyfill with a try/catch.

@@ -254,8 +254,7 @@ export class SearchService implements Plugin<ISearchSetup, ISearchStart> {
const searchSourceDependencies: SearchSourceDependencies = {
getConfig: <T = any>(key: string): T => uiSettingsCache[key],
search: asScoped(request).search,
// onResponse isn't used on the server, so we just return the original value
onResponse: (req, res) => res,
onResponse: (req, res) => shimHitsTotal(res),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discovered that shimHitsTotal is applied in the low-level search routes, but not in the actual service that's available on the plugin contract, so we need to apply it here for search source.

(This is only necessary on the server, as the client is going through the http routes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @lizozom @lukasolson Whenever shimHitsTotal is updated to be applied at the search strategy level instead of the route level, this change can be reverted.

let expectExpression: ExpectExpression;

const expectClientToMatchServer = async (title: string, expression: string) => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a function to compare results from client & server. Longer term we might consider updating helpers.ts to have expectExpressionServer or similar, which would make it even easier to run expressions on the server.


router.post(
{
path: '/api/interpreter_functional/run_expression',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this route does is take an expression & optional input, call expressions.run from the server contract, and return the result.

This way we can gradually build out a suite that tests the same expressions on both client & server.

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some small things, but overall it looks good

index: [indexPattern.id],
metricsAtAllLevels: [false],
partialRows: [false],
includeFormatHints: [true],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure - did the flag includeFormatHints become unnecessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, includeFormatHints isn't even returned in the esaggs response on master -- the argument itself just never got removed.

x-pack/plugins/lens/server/migrations.test.ts Outdated Show resolved Hide resolved
@lukeelmers lukeelmers requested a review from flash1293 December 9, 2020 16:40
@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
expressions 103 104 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
data 244.3KB 244.3KB -2.0B
lens 1011.8KB 1012.5KB +699.0B
total +697.0B

Distributable file count

id before after diff
default 46984 47744 +760

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
data 985.1KB 1002.4KB +17.3KB
expressions 182.6KB 189.9KB +7.3KB
lens 53.2KB 53.2KB +7.0B
visTypeMetric 25.9KB 26.0KB +163.0B
visTypeTable 18.9KB 19.1KB +163.0B
visTypeTagcloud 18.8KB 19.0KB +163.0B
visTypeVislib 66.0KB 66.1KB +91.0B
visualizations 169.0KB 169.2KB +162.0B
total +25.3KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@flash1293 flash1293 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making this change and giving us some type safety around agg building :) There are some JSON-serialized parts in there I guess you will revisit them later? Let me know if the Lens team can help here.

I thought a bit more about the histogram min_doc_count thing and I think we have to do some fixes here, but for this PR it makes sense to keep the existing behavior even if it's not perfect.

@flash1293
Copy link
Contributor

I focused on Lens, would appreciate a look at the Visualize bits from @stratoula

Copy link
Contributor

@stratoula stratoula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy visualizations code review LGTM, I tested all of them with various aggs and I can't find any regression. Thanx @lukeelmers for fixing the serial Diff bug 🙂

@lukeelmers
Copy link
Member Author

There are some JSON-serialized parts in there I guess you will revisit them later? Let me know if the Lens team can help here.

@flash1293 Yep we have #67908 to track further work in this area. I'm sure the team would love any help you can provide 😄

Each agg param type now has its own toExpressionAst which is called by the top-level aggConfig.toExpressionAst, so the aim is to eventually add expression functions for each of these, and update their toExpressionAst methods accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Aggregations Aggregation infrastructure (AggConfig, esaggs, ...) Feature:Embedding Embedding content via iFrame Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) Feature:Search Querying infrastructure in Kibana release_note:plugin_api_changes Contains a Plugin API changes section for the breaking plugin API changes section. review v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expressions: move esaggs to the server improved semantics for esaggs function
6 participants