Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add go-bexpr filters to evals and deployment list endpoints #12034

Merged
merged 5 commits into from
Feb 16, 2022
Merged

Conversation

lgfa29
Copy link
Contributor

@lgfa29 lgfa29 commented Feb 9, 2022

Add support for go-bexpr filters to /v1/evaluations and /v1/deployments.

Detailed documentation and changelog entry will come in follow-up PRs where this pattern will be applied to other endpoints.


Note to reviewers:

A previous version of this work had an attempt to use go-memdb indexes if possible when the filter expression was a simple equality check over namespace or prefix, but I found some problems with that implementation.

For prefix, there are multiple ways to express that with go-bexpr. The most strict way is using matches and regexp: ID matches "^123", and checking for a strict equal value (ID == "...") seems unusual for a list endpoint. Calling /v1/evaluation/:id directly would make more sense.

Extracting namespace out of filters like Namespace = "dev" worked better, but the problem is that when parseNamespace is called at the HTTP layer, the RPC handler is not able to tell if namespace=default means the user didn't set any namespace or if they explicitly set default. Overwriting the namespace could then lead to surprising results when using the default namespace.

Comment on lines +514 to +542
} else if strings.HasSuffix(errMsg, structs.ErrIncompatibleFiltering.Error()) {
errMsg = structs.ErrIncompatibleFiltering.Error()
code = 400
Copy link
Contributor Author

@lgfa29 lgfa29 Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is the right place to handle this, but it would be good to have it in a code path that is common to all endpoints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is absolutely a weird place to jam it and not the API (HasSuffix?!) any of us would prefer but... it's idiomatic for now. 👍

nomad/deployment_endpoint.go Outdated Show resolved Hide resolved
Comment on lines +8 to +16
// Iterator is the interface that must be implemented to use the Paginator.
type Iterator interface {
// Next returns the next element to be considered for pagination.
// The page will end if nil is returned.
Next() interface{}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this while working on a different approach, where the filter would create another iterator, but I kind of like this change regardless?

But I can revert it if we want to keep it like it was.

"github.com/hashicorp/nomad/nomad/structs"
)

func BenchmarkEvalListFilter(b *testing.B) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark results from my machine:

cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkEvalListFilter/filter_with_index-16                3700           6832056 ns/op             458 B/op         13 allocs/op
BenchmarkEvalListFilter/filter_with_go-bexpr-16              100         234049608 ns/op        47202395 B/op    4000009 allocs/op
BenchmarkEvalListFilter/paginated_filter_with_index-16           4819960              4628 ns/op            2616 B/op         24 allocs/op
BenchmarkEvalListFilter/paginated_filter_with_go-bexpr-16          59338            409490 ns/op          122438 B/op       8325 allocs/op

Filtered results are an order or two of magnitude slower than using an index so we should document query params that are index based.

I also had a version that used FilterIterator, but the result was the same as filter_with_go-bexpr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 100 eval page in the paginated results are definitely more "realistic" than us trying to iterate over 100k evals at once.

Something else to keep in mind looking at this is that in paginated_filter_with_go-bexpr we're creating the bexpr evaluator every iteration whereas in filter_with_go-bexpr we're creating the bexpr evaluator once outside the b.N loop. Unfortunately a given API call doesn't get to amortize creating the evaluator across multiple pages, so paginated_filter_with_go-bexpr is going to be even worse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point, should I move the call to CreateEvaluator so it's within the bechmark for loop?

Another thing that I thought is that there's a diminishing advantage of using pagination when querying later pages since we need to iterate over everything that comes before anyway, even if we don't apply the filter:

cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkEvalListFilter/filter_with_index-16                3673           6704301 ns/op
BenchmarkEvalListFilter/filter_with_go-bexpr-16              100         220508127 ns/op
BenchmarkEvalListFilter/paginated_filter_with_index-16           5146934              4724 ns/op
BenchmarkEvalListFilter/paginated_filter_with_go-bexpr-16          55220            480332 ns/op
BenchmarkEvalListFilter/paginated_filter_with_index_last_page-16                    2017          11966223 ns/op
BenchmarkEvalListFilter/paginated_filter_with_go-bexpr_last_page-16                  872          27409941 ns/op

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we combine this with #12054 it seems like we "just" need to take into account the Filter values when picking the index to use, which we basically already do when the requested Namespace is set.

i.e. in this block

Comment on lines 32 to 38
string parameter.
string parameter and is backed by a data store index, so it can be used to
reduce the number of entries processed by `filter`.

- `namespace` `(string: "default")` - Specifies the target
namespace. Specifying `*` will return all evaluations across all
authorized namespaces.
authorized namespaces. This parameter is backed by a data store index, so it
can be used to reduce the number of entries processed by `filter`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if these additions are good. Feel free to suggest alternatives 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if here we focus on the gotcha and link to general filter documentation for more information?

This parameter is used before any filter is applied. See the the filter docs for more information.

Then we could add -filter related documentation here: https://www.nomadproject.io/docs/commands#filter This would be the ideal place to call out how the NOMAD_NAMESPACE env var will take precedence over any namespace filtering. Although I suppose api package users will miss that... 🤔

And ?filter=... related documentation here: https://www.nomadproject.io/api-docs#filter

With perhaps more expansive docs on the query language here? https://www.nomadproject.io/docs/internals/filtering

Or perhaps skip the command and api docs and link directly to an internals doc?

Point being while I think noting these performance impacts is important, I think we also need to guide users toward proper usage of env vars vs query params vs filters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm planning on having more comprehensive docs about filtering and pagination in a follow-up PR. Basically copy most of Consul's docs 😅

Having a CLI section also sounds like a good idea.

In this PR I will add just the first sentence, and then add a link to the new sections in the follow-up PR.

Comment on lines 90 to 91
match, err := p.filterEvaluator.Evaluate(raw)
if !match || err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What errors can occur here and should we return them and cease iteration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. An error would happen here if the user filters by an invalid field (second example below). I pushed a commit to improve error handling and also make their location more descriptive.

Some failed request samples:

$ curl --get 'http://localhost:4646/v1/evaluations?per_page=1' --data-urlencode 'filter=InvalidExpression'
failed to create result paginator: failed to read filter expression: 1:18 (17): no match found, expected: "!=", ".", "==", "[", [ \t\r\n] or [a-zA-Z0-9_/]
$ curl --get 'http://localhost:4646/v1/evaluations?per_page=1' --data-urlencode 'filter=InvalidField == ""'
failed to read result page: error finding value in datum: /InvalidField at part 0: couldn't find key: struct field with name "InvalidField"

Comment on lines 32 to 38
string parameter.
string parameter and is backed by a data store index, so it can be used to
reduce the number of entries processed by `filter`.

- `namespace` `(string: "default")` - Specifies the target
namespace. Specifying `*` will return all evaluations across all
authorized namespaces.
authorized namespaces. This parameter is backed by a data store index, so it
can be used to reduce the number of entries processed by `filter`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if here we focus on the gotcha and link to general filter documentation for more information?

This parameter is used before any filter is applied. See the the filter docs for more information.

Then we could add -filter related documentation here: https://www.nomadproject.io/docs/commands#filter This would be the ideal place to call out how the NOMAD_NAMESPACE env var will take precedence over any namespace filtering. Although I suppose api package users will miss that... 🤔

And ?filter=... related documentation here: https://www.nomadproject.io/api-docs#filter

With perhaps more expansive docs on the query language here? https://www.nomadproject.io/docs/internals/filtering

Or perhaps skip the command and api docs and link directly to an internals doc?

Point being while I think noting these performance impacts is important, I think we also need to guide users toward proper usage of env vars vs query params vs filters.

Comment on lines +94 to +97
if err != nil {
p.pageErr = err
return nil, paginatorComplete
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgross any thoughts on this approach for handling errors during pagination?

I thought of creating a new paginatorState called paginatorError, but having a non-nil pageErr would already indicate that so I just used the paginatorComplete state to break the loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as-is, but if you want to go the extra mile and make this really conventional Go, take a look at the Err() pattern used for bufio.Scanner

https://pkg.go.dev/bufio#example-Scanner-Lines

The caller then only deals with 1 error after iteration is "complete", rather than on each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, good to know about this pattern. But I think that, in this case, callers would deal with one page at a time, so having to remember to call Err() may be a little more error-prone?

Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just some wording suggestions and the idea for cleaning up pagination errors, feel free to ignore

"github.com/hashicorp/nomad/nomad/structs"
)

func BenchmarkEvalListFilter(b *testing.B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we combine this with #12054 it seems like we "just" need to take into account the Filter values when picking the index to use, which we basically already do when the requested Namespace is set.

i.e. in this block

nomad/structs/errors.go Outdated Show resolved Hide resolved
website/content/api-docs/deployments.mdx Outdated Show resolved Hide resolved
website/content/api-docs/deployments.mdx Outdated Show resolved Hide resolved
website/content/api-docs/evaluations.mdx Outdated Show resolved Hide resolved
website/content/api-docs/evaluations.mdx Outdated Show resolved Hide resolved
website/content/docs/commands/deployment/list.mdx Outdated Show resolved Hide resolved
website/content/docs/commands/eval/list.mdx Outdated Show resolved Hide resolved
Comment on lines +94 to +97
if err != nil {
p.pageErr = err
return nil, paginatorComplete
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as-is, but if you want to go the extra mile and make this really conventional Go, take a look at the Err() pattern used for bufio.Scanner

https://pkg.go.dev/bufio#example-Scanner-Lines

The caller then only deals with 1 error after iteration is "complete", rather than on each iteration.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/api HTTP API and SDK issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants