-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SavedObjects] Advanced find
API
#84729
Comments
Pinging @elastic/kibana-core (Team:Core) |
I think the goal of providing a separate SO schema as an abstraction over the persistence schema was a good idea in principle. Saved objects are primarily used from the browser and queries are often embedded in UI components. This means the saved object schema abstraction is the only layer between the UI and the persistence mechanism. If we remove this layer it would become very hard to make any changes to how objects are persisted. However, it requires re-writing fields in every Elasticsearch API that we want to expose. Because of the complexity of doing this, the saved objects API has always been severely limited leading to awkward work arounds or plugins deciding to use their own indices directly #80912 Even with the abstraction layer provided by Saved Objects there are still several problems with this architecture as discussed in #72028. The better alternative would be to stop using Saved Objects from the browser and for plugins to introduce their own HTTP API's. This API layer will provide an abstraction over the persistence and also allow plugins to perform complex business logic on the server instead. Once there is a well tested HTTP API abstraction layer it would be less difficult and safer to make breaking changes to the persistence mechanism if this ever becomes a problem. So although it will take time for plugins to migrate over to such an architecture I feel like long term an HTTP API abstraction layer is a better solution than an SO abstraction layer that severely limits the functionality ES provides. We still have to decide how we will eventually fix the API consistency issues, to make it consistent we will have to remove the SO schema completely which would be a risky breaking change. The alternative is to introduce a new SO API which plugins can migrate over to, but will only be available server-side and this entire API will only expose the persistence schema. This will also provide some motivation for plugins to adopt this new architecture. To minimize the complexity we could build the SO Schema API on top of the persistence schema API. Perhaps SavedObjectsClient -> SO Schema, SavedObjectsRepository -> Persistence Schema. My only hesitation is introducing additional complexity to saved objects (and spaces/security). Since we probably won't be able to remove the SavedObjectsClient before 9 we should be sure that this complexity is maintainable. But if the SavedObjectsClient just does translation to the SO Schema we have the potential to simplify the repository. This would also solve the problem of the circular dependency between KQL and Core #55485 as well as the KQL performance issues #78348 |
The other alternative is to add an Elasticsearch alias like mappings: {
properties: {
description: { type: 'text' },
'attributes.description': { type: 'alias', path: 'dashboard.description' },
...
}
} Apart from the field count increase which will require increasing the default limit there aren't too many disadvantages of this approach. But I feel like it maybe prevents us from pursuing the longer term vision of a multi-layer architecture. Also any persistence vs so schema abstraction we introduce with aliases can just as easily be used to make a breaking change to the persistence schema at a later point in time. So I feel like working around the risk of a breaking persistence change is not worth it. |
/cc @legrego |
Not to cherry-pick a single sentence, but I think this summarizes my feelings pretty well:
Having the SavedObjectsClient perform the translation makes sense to me. This could potentially be a post-processing step that happens after spaces, security, and encrypted saved objects have ran through their logic. That way we could adapt our wrappers to work against the persistence schema directly, and we wouldn't have to maintain two implementations for each of these wrappers. |
I think that would make sense. However, I'm not sure to see how this would solve the issue encounter for aggs for example? We would still need to code / maintain the transformation and validation, it would just move it to the client instead of the repository. Or would the client only have a subset of the features / apis of the repository in that case (like, you can't perform aggs directly using the client?) |
Yes. I think we would deprecate the SavedObjectsClient and the auto-generated API's. We probably won't be able to remove it before 9.0 or maybe never, but the recommended way to consume saved objects would become to create a custom API and use the SavedObjectsRepository. |
This issue was created following #64002 and the resulting sync discussion that occurred afterwise, and its goal is to discuss the 'advanced' usages of the
find
API (such as aggregation support) and how we should provide them.For some context, see #64002, which introduces the bases of aggregation support for SO.
The root of the problem is that Saved Objects was designed to implement an abstraction layer over how saved objects are persisted in Elasticsearch. This means the same piece of data has a different shape / schema depending on where you look at it:
The PR's current implementation is performing the same SO fields rewrite and validation for aggregations that we are currently doing for KQL query that translates between the SO schema field names and the persistence schema field names. For example, to aggregate on the
version
attribute of thevisualization
type, we would be using something likefield: 'visualization.attributes.version'
in the aggregation. Also, when trying to aggregate against an invalid field (a field not present on a given object), the agg validation will fail.If at first glance, that can sound like the correct approach (and may be), the underlying implementation of this rewriting/validation is incredibly complex, as we need to know all the possible syntax of all the aggregations we need to support. This code is also very hard to read / maintain (that is, by nature, as pseudo-AST traversing will never be developer friendly)
With is why, after some (long) discussions with @rudolf and @XavierM, we started to wonder if such rewriting was really the best solution, and if just exposing the 'raw' aggs API (with eventually just basic validation such as
script
removal), wouldn't be a pragmatic option with only few downsides.The biggest pro being, as already mentioned, the significant simplification of the code and the time we'll avoid to loose when maintaining or enhancing the feature.
After the sync discussion, the main cons were:
KQL
field syntax, mismatch with SO Schema)find
API, with the associated risks (the Persistence schema could become an 'official' part of the SO API as directly exposed, and the impossibility to change the Persistence schema implementation without introducing breaking changes to the API)If the first point was considered acceptable, the second one is more bothersome.
The main questions at hand are:
find
(or other) APIs, or should we try to create some kind of 'advanced' find API, that would have a lower abstraction level (and would ideally not directly be exposed to the client via HTTP endpoints)cc @kobelb @XavierM
The text was updated successfully, but these errors were encountered: