-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] [Meta] Support for time series indexing, doc-value-only fields, and synthetic source #132818
Comments
Pinging @elastic/fleet (Team:Fleet) |
Should we be able to enable this on a package level or data stream level? |
I'd prefer to keep this as simple as possible, and only do it on the package level if we don't need to be able to do it on a data stream level. |
@kpollich @jen-huang We could enable the ecosystem team to add each of these toggles if we first provided a basic framework for adding a package-level setting that is used at install time. I think that framework need to include:
This would probably be quite low effort to provide the basic plumbing and I think the Ecosystem folks would be able to use that to make the specific changes required for each feature. |
@joshdover for the specific case of synthetic source, given that the change does not depend on specific fields, etc. Do you think it would be feasible, as a first step, to introduce this toggle (package or data-stream level, whatever we think it's better) just on the Fleet side without requiring any new setting in the packages? This would remove the need to do an additional release of every existing package. Once we add the setting, it can be used as the default value for the toggle for that package / data-stream that we should make sure is "use synthetic source" for every new package that we create. |
@andresrc I think starting with synthetic source would make the most sense. We could use that to build out the plumbing I described in #132818 (comment) and then leverage it for other opt in features in the future. One tricky thing about synthetic source is the limitation on keyword fields that have |
(edited after #132818 (comment)) @joshdover @jsoriano I as we move forward with the testing we keep finding corner cases: the Given this, I would like to propose a gradual approach. Phase IAdd a toggle at the data stream level with a default value of not using synthetic source. This would allow greater granularity at enabling the feature to test for potential breakages and leverage the benefits when possible. We have very big packages which mix logs and metrics data streams. Package-level granularity would not be very practical. When the toggle is enabled:
The toggle (or the screen containing the toggle) would also show some badge or similar warning that this is technical preview / beta feature. With this, we could start gradually recommending the use of synthetic source in specific data streams without changes as we feel confident about them and without generating the risk of breaking changes. Phase IIDepending on the results of the previous phase, we can consider different options:
We will also need to consider the decisions made around the future of the Phase III(probably in a major) Always use synthetic source by default. |
@andresrc thanks, the plan sounds good to me. The only change I would do is to make the phase II conditional to the results of phase I. If things go well I think that we could go directly to the first point of phase III, enabling synthetic source by default in all new policies, and avoid adding a setting that would have expiration date. Such a setting may be confusing for package developers. |
Thanks @jsoriano , edited the comment |
Is this a UI toggle or an API toggle? If it's an API, I don't think we really need to create anything new as it seems the same as asking the users to set this on the custom component template using it's existing Elasticsearch API. If we want UI support, we could add the toggle to component template editor UI to allow setting synthetic source there (right now you have to do it in Dev Tools). This would help us avoid needing to determine where to list these data steams and settings for them (we don't have any such UI today).
One tricky thing is that we may need to specify an ES version number requirement for the data stream since the restrictions around synthetic source are changing between releases (eg. support for |
@joshdover I think these flags should be managed by Fleet, in case we later decide to enable these features everywhere automatically, or depending on certain conditions. While Fleet is aware of the setting, I don't have a strong preference for API or UI. UI would be better to make it easier to recommend its use to users or not so experienced package developers, but it'd be nice if the flags are also exposed through the Fleet API in any case. |
@joshdover I have updated the comment with some additional considerations, including what to do with
I would prefer to start with an UI toggle as it would be easier for final users to try it for certain data streams where they might get the benefit.
If we are doing more "magic" here (i.e. removing the |
If we need to modify existing mappings for this "magic" then this probably isn't the right place for it. Let's figure out a good place within the Integrations UI. I think having something on the "settings" tab for an installed integration could make sense. Marking this issue as needing design, since that is the next step before we can do implementation work. |
we have discovered an issue whilst testing this on integration data streams. fleet's 'final pipeline' component template contains the after a brief discussion it was suggested that simply removing |
FYI - i have found another barrier to enabling this in certain integrations which declare fields with dynamic mappings. the I'm looking into ways around this now, but wanted to give you a heads up that currently this fails at index time. edit: |
I spoke with @mukeshelastic a bit offline about potential avenues for implementation here. Generally, I am in favor of an implementation that consists of the following
There is one major caveat that @tommyers-elastic has begun broaching above: the incompatibility between the We have a few options to work around this limitation, mentioned in various comments above:
I'll attempt to detail the tradeoffs for each approach below. 1. Fleet intelligently adds/removes
|
Thanks for the detailed write up @kpollich and the trade offs mentioned. If I think long term, ideally having synthetic source just becomes a setting on the data stream and all the magic around it is just handled by Elasticsearch (Option 2). But this might take a bit longer. My suggestion would be to start talking about Option 2 to the Elasticsearch team but do a basic implementation for Option one 1 Fleet on the data stream level. This would be an experimental feature or similar with limited support. For example |
I would also say to start with Option 1 at least for the initial implementation of the opt-in feature, this can help to validate the feature and the specifics can evolve over time with option 2 or other alternatives that may appear. @nik9000 wdyt about the option 2 described in #132818 (comment) ? |
I'm +1 on this implementation strategy. Flagging the settings UI for these index settings as |
Petition accepted. Sort of. I think we'll actually be able to support But, like, I think it wouldn't be super bad to start by removing |
I feel bad for the owner of this github username. Same for whoever owns |
Since we shipped the experimental support for synthetic source in 8.5, I think we should consider getting doc-value-only fields support in next, with a similar experimental UX. I believe the amount of effort involved in adding support it this way should be quite small and may have a very large impact (20% storage savings, 20% improved indexing perf), so I'd prefer we enable this to start being tested sooner than later. @kpollich do you have a rough estimate of effort required to add support for this as an experimental toggle? My understanding is that when enabled, we'd need to modify the component templates to set |
A separate issue would be great. The way we've added synthetic source as a toggle is fairly extensible, so I think a lot of the groundwork is already laid here. This is probably a one week lift to implement a toggle. |
On the priority side, could we get the enabling of time series in first instead of doc values only part? This would allow us to better test TSDB indices. |
The feature for TSDB might even be split up into 2 parts: Support in Fleet if it is set in the package (elastic/package-spec#357). @kpollich Is this supported today? And second to enable it on demand by switching over. Note: Switching back for TSDB is not possible as far as I know. |
@kpollich @jen-huang I have updated this issue to track both the experimental toggles for testing as well as the long-term support for the real GA feature support. All tasks and bugs related to these features should be added here. |
This meta issue also needs an owner on the Fleet team. I want to make sure someone has the time to fully understand the goals of these new indexing features and how integrations should leverage them. I think some of the discrepencies in behavior that have gotten implemented (see #147684 for examples) may have been avoided with fewer people working on these items. |
Created a few issues based on offline discussions over on Google docs with @lucabelluccini
Something else that hasn't come up yet - are there licensing restrictions around these indexing features at all? https://www.elastic.co/guide/en/elasticsearch//reference/master/tsds.html doesn't mention any licensing restrictions for TSDS. https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#synthetic-source mentions synthetic https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html#doc-value-only-fields seems to imply doc value only fields are GA, but no licensing restrictions specified. As far as I can tell, these features are all available with a basic license, so I don't see any issues here. I could be wrong though, though I'm not sure how to confirm. |
@giladgal where could we find this information regarding licensing for TSDS related features? |
Licensing is not described in the documentation. The information about licensing is in the file headers and in the subscriptions web page. |
Hi all. I'm closing this in favor of a new meta issue tracking the few outstanding long-term support/stability tasks around TSDS. See https://github.com/elastic/ingest-dev/issues/1773. Thanks for all your help here! |
We have three new indexing features in Elasticsearch that can reduce the overall storage size of data significantly:
_source
to data streams #140095source_mode
for data streams #141211index_mode: time_series
setting during package installation #146804metric_type
does not work for all the expected fields #148057index: false
on fields that are rarely used for filtering integrations#3419We'd like to enable integration developers to start testing the ingest and query performance of enabling these features before we start making any changes in the integrations themselves or allowing end users to enable these from the Fleet UI.
Today, each of these can already be enabled by leveraging the
*@custom
component templates that Fleet installs for each integration data stream, to varying degrees of ease of use (details below). We could improve the UX around this for integration developers by adding an explicit API in Fleet to enable this, however it may not be necessary.How to do this today
See https://github.com/elastic/integrations/blob/main/docs/how_to_test_new_indexing_features.md
The text was updated successfully, but these errors were encountered: