-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core.savedObjects] Add helper for using find
with pit
and search_after
.
#92981
Conversation
bf5fbcb
to
51a1b27
Compare
@elasticmachine merge upstream |
*/ | ||
export function createPointInTimeFinder({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested in the last PR, I got rid of this as it wasn't really doing anything. Now the repository just uses the PointInTimeFinder
class directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we added the SavedObjectsPointInTimeFinderDependencies
, I would actually have kept the factory method 😅
type PITfactory = (deps: SavedObjectsPointInTimeFinderDependencies) =>
(options: SavedObjectsPointInTimeFinderOptions) => PointInTimeFinder
but I don't mind the current implementation.
src/core/server/saved_objects/service/saved_objects_client.test.js
Outdated
Show resolved
Hide resolved
...ugins/encrypted_saved_objects/server/saved_objects/encrypted_saved_objects_client_wrapper.ts
Show resolved
Hide resolved
Pinging @elastic/kibana-core (Team:Core) |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
app arch changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't any API exposed by a plugin using createPointInTimeFinder method end up being stateful?
@thomheymann That's a good point -- and yes, this helper would not be well-suited to a case where you want your plugin to expose an API that has a traditional paging mechanism where you make a single API call from the client for each new page of results. It was originally designed for use with saved object exports and exposed to the server only... so it really just works in scenarios where you are either collecting items on the server in memory, or streaming responses back to the client.
I think if we wanted to support page-by-page requests, we'd need to build a dedicated REST API for handling this, which bypasses the generator and more or less proxies queries directly to ES, requiring the API consumer to pass back the _pit
and search_after
params and implement the generator's logic on their own.
I think that could be a logical next step if folks are asking for it, but I'm not sure it's strictly necessary now. WDYT? (cc @pgayvallet @rudolf)
we could add a unit test to x-pack/plugins/security/server/saved_objects/secure_saved_objects_client_wrapper.test.ts that exercises the PIT Finder and verifies that audit logs are created. I'm a bit ambivalent about this though as it's sort of outside the scope of unit tests for this wrapper, that would be exercising functionality of the PIT Finder itself. WDYT?
@jportner IMHO such a test wouldn't really be providing additional value. We are already testing that the wrapper's find
method is being called by the PIT finder, so as long as we already have tests confirming that find
itself will produce an audit log, then I think we have coverage for this scenario.
src/core/server/saved_objects/service/lib/point_in_time_finder.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/spaces/server/saved_objects/spaces_saved_objects_client.ts
Show resolved
Hide resolved
@elasticmachine merge upstream |
Makes sense, agreed, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly small nits and documentation suggestions
yeah I think the intention is for this pointInTimeFinder helper to be convenient but not solve every use case. In this case it explicitly doesn't solve the paging from a browser use case. There's some disparate conversations happening about paging through saved objects. For performance reasons we really shouldn't build UI's that try to allow users to page through e.g. more than 1000 results. Performance impact #93770 It might be necessary to build an HTTP API that allows external other developers to iterate through many pages of saved objects. We could either solve that like import/export by streaming all results in a single request, or like luke's proxying the PIT and search_after params idea. |
In addition to #92981 (comment) and #92981 (review): Yea, this API was meant to be used by server-side code processing the scroll in a single 'batch', not to keep the PIT open to have the client-side ask for the next batch. This use case will effectively not work in multi-instance (or in clustering mode FWIW). Also, even in single instance, that kind of scenarios would risk to reach the PIT timeout if the request for the next page is based on user interaction. Should we document the limitations and 'forbidden' use cases of this new API/helper? |
Yeah I will document this before merging. I think it would be easy to look at the code examples and think it would be perfect for paging, when really it can only be used if you are handling paging entirely on the client and shipping all the data up front. |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: cc @lukeelmers |
Closes #91175
Unblocks #93770
Summary
In #89915 we introduced a
PointInTimeFinder
to make it easy to iterate over large numbers of saved objects usingpit
andsearch_after
. This helper was used internally by theSavedObjectsExporter
.This PR exposes the PIT finder as a helper directly on the saved objects client[0] so that it can be used by other plugins which are working with large numbers of saved objects.
Usage
The helper takes in the options you would normally pass to
savedObjectsClient.find
[1], and returns an async generator which you can iterate over to retrieve subsequent pages of results.The generator handles opening a Point-In-Time (PIT) in Elasticsearch, and uses the PIT with the
search_after
param to handle "paging" through results at the specifiedperPage
interval. It will auto-close the PIT on any errors, or whenfinder.close
is called[2].Internally, the SO repository injects a logger which is used inside the finder, and the APIs which the finder uses under the hood (
find
,openPointInTimeForType
, andclosePointInTime
) are passed in from the clients as dependencies, to ensure SO client wrappers still work.Notes
[0] Note that
createPointInTimeFinder
is only exposed on the server SO client -- (not on the public SO client or via any REST APIs).[1] Technically it accepts a subset of the
SavedObjectsFindOptions
, as thepage
,pit
andsearchAfter
params are managed by the finder. There's aSavedObjectsCreatePointInTimeFinderOptions
interface which reflects this difference.[2] Do not neglect to close the finder when you are done paging (which closes the PIT), otherwise you will cause your cluster to consume resources unnecessarily.
Testing
The main thing to verify here is that there have been no regressions in the behavior of
SavedObjectsExporter
... otherwise this is a new API which is currently not used elsewhere. I added some notes to the original PR with a description of how to manually test the exporter.