Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate "value scripts" on aggregations? #31532

Closed
polyfractal opened this issue Jun 22, 2018 · 8 comments
Closed

Deprecate "value scripts" on aggregations? #31532

polyfractal opened this issue Jun 22, 2018 · 8 comments

Comments

@polyfractal
Copy link
Contributor

Today, there is a (little-documented) feature where specifying a field + script on an aggregation will run the script in "value script" mode. Each value in the field will be passed to the script under the _value variable so that the script can modify it, before letting the aggregation collect it.

This is a very trappy API, because there are no safeguards from the user specifying a "normal" script that tries to access regular fields. It can lead to confusion like #20773 and #11728

In #20733, we decided to add a value_script and deprecate field + script combo. Which is certainly doable, but I'm wondering if we should just deprecate the concept of value scripts entirely? I'm not sure how widespread usage actually is?

Besides being confusing, value scripts adds a fair amount of complexity to the code when trying to resolve the ValuesSource, adds another set of Scorer wrappers (ValuesSource.Numeric.WithScript, ValuesSource.WithScript) and adds a bespoke setNextAggregationValue() in the script implementation solely to satisfy this feature.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@jpountz
Copy link
Contributor

jpountz commented Jun 22, 2018

I'm usually on the side of reducing surface area but I tend to like value scripts due to the fact that they are much simpler and less error-prone if not all documents have a value for the field, or if some have multiple values. However I agree that they lack validation. We should at least make sure that the _value variable is used within a value script, I think this would have caught the two issues that you linked?

@polyfractal
Copy link
Contributor Author

For docs missing a value, I think the missing option is available on all aggs? That should cover the functionality right?

Makes sense for multi-values though, no other way for that functionality I guess. We could extend regular scripts to allow returning arrays of values, which would allow the user to deal with multiple values as desired. Then make field and script mutually exclusive. But that does complicate the script itself a bit, as the user would need to iterate on values inside the script, build an output array, etc.

@rjernst do you know if it's possible to check if a Painless script compiled/used a particular variable? I mean, we could grep for the string but that seems error prone :)

If it's possible, I agree that adding validation for _value and doc usage would make it more user friendly.

@jpountz
Copy link
Contributor

jpountz commented Jun 22, 2018

do you know if it's possible to check if a Painless script compiled/used a particular variable? I mean, we could grep for the string but that seems error prone :)

It is, we do this in order to know whether scripts are using the _score or not (we can run things more efficiently when scoring is not needed, like not loading norms and term frequencies).

@polyfractal
Copy link
Contributor Author

Ah good to know. That seems a viable alternative then.

@rjernst
Copy link
Member

rjernst commented Jun 26, 2018

IMO having 2 ways of doing the same thing just adds confusion. I am in favor of removing value scripts.

I tend to like value scripts due to the fact that they are much simpler and less error-prone if not all documents have a value for the field, or if some have multiple values

I think this has the same problem that the existing doc['fieldname'].value has within scripts, which is that we arbitrary choose a "default" when the field is missing for a document. The user should have to make a conscious choice of what to do when no values exist, which #30975 should help address.

Given that there is no performance difference between the two types of scripts, I favor a single way to do things.

@nik9000
Copy link
Member

nik9000 commented Jun 26, 2018

I believe SQL uses values scripts but I'm not 100% clear on how without looking into it.

@polyfractal
Copy link
Contributor Author

We talked about this in the Search/Aggs meeting and decided that while it would be nice to simplify, it's probably not worth the BWC break and associated hassles.

It's a low maintenance item, we can add validation to make sure value scripts are used correctly, and we can avoid a tricky BWC break (moving to a single script would require the user to change the agg syntax and also rewrite the scripts). Lastly, the code needed to deprecate this feature would itself be tricky to maintain since this touches a not-so-great builder.

I'm going to open an issue to add validation for _value and doc so they aren't misused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants