Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bounds on number of elements in api/v1/log/entries/retrieve. #901

Closed
Tracked by #1005
var-sdk opened this issue Jul 1, 2022 · 7 comments · Fixed by #1011
Closed
Tracked by #1005

Add bounds on number of elements in api/v1/log/entries/retrieve. #901

var-sdk opened this issue Jul 1, 2022 · 7 comments · Fixed by #1011
Labels
enhancement New feature or request ga_candidate Proposed blocking issue for GA release

Comments

@var-sdk
Copy link

var-sdk commented Jul 1, 2022

The API definition for retrieve does not specify an upper bound for number of entries in the request. It is trivial to construct a request that takes O(many seconds) to return with a response size O(MB). Adding upper bounds on number of entries per request will let us offer an achievable SLO for this endpoint.

@var-sdk var-sdk added the enhancement New feature or request label Jul 1, 2022
@haydentherapper haydentherapper added the ga_candidate Proposed blocking issue for GA release label Aug 22, 2022
@dlorenc
Copy link
Member

dlorenc commented Aug 23, 2022

This is the search index. I think we'll also need to add pagination if we do this, we'll need a way to handle retrieving everything.

@var-sdk
Copy link
Author

var-sdk commented Aug 23, 2022

I think there are different concerns for api/v1/index/retrieve vs api/v1/log/entries/retrieve:

For api/v1/log/entries/retrieve: the number of input elements is unbounded, but the number of responses appears to be <= the number of elements requested. Each successful lookup requires a one more more fan out calls to Trillian so clients can currently generate significant load with a small number of requests. If this is all accurate, for GA would could just put a limit on the number of entries requested and let clients handle pagination.

For /api/v1/index/retrieve: the concern here is that a single search could potentially return a large number of requests. This API doesn't seem to involve a back end call to Trillian for each result so there aren't the same concerns on fanout on backends. The main concern here would be memory usage while the request is being handled. I suspect this is largely theoretical (?) given the number of entries in the log at the moment but a misbehaving user could eventually trigger this as well. Pagination sounds like the right approach but I guess we could spin up a read replica for search if we just want to isolate it from the core read/write path. Not sure either are GA worthy.

CC @bobcallaway and @priyawadhwa to keep me honest here.

@priyawadhwa
Copy link
Contributor

Yah, that sounds correct to me.

Gathered some data for the api/v1/log/entries/retrieve endpoint (averaged across 3 runs):

10 entries: 331 ms
20 entries: 231 ms
50 entries: 480 ms
100 entries: 884 ms
200 entries: 1947ms (averaged across two runs, errored out on third run with a `504 Gateway Time-out` from nginx

I think capping api/v1/log/entries/retrieve at 50 entries would be good to get in before GA.

Pagination could be post-GA work. I checked to see if Trillian has any kind of support for it, but all I could find was this function which could be useful when we get to it.

https://github.com/google/trillian/blob/0a389c4bb8d97fb3be8f55d7e5b428cf4304986f/trillian_log_api_grpc.pb.go#L66-L68

@var-sdk
Copy link
Author

var-sdk commented Aug 23, 2022

Behavior under load is also important. I was able to observe 100% error rate (and cause other problems) sending sustained traffic with 50 entries per request for a short amount of time. What are the primary use cases for this API? What would be the consequences of reducing this limit to 5 or 10? I may be worth adding a metric to measure distribution of request size.

@priyawadhwa
Copy link
Contributor

I think lowering the limit should be fine. Not sure if anyone is using it tho, I think you mentioned parsing logs in Slack to find out and that might be a good place to start.

@dlorenc
Copy link
Member

dlorenc commented Aug 24, 2022

Thanks for clarifying, I had the wrong endpoint. I didn't actually even know this API supported multiple. Dropping sounds fine to me.

@bobcallaway
Copy link
Member

bobcallaway commented Aug 24, 2022

+1 to limiting entry length to 50 or lower

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ga_candidate Proposed blocking issue for GA release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants