-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for value aliases/data deduplication #39758
Comments
I think you can accomplish this with the new(ish) These are essentially field aliases that can be added to an existing mapping, pointing at an existing field. Is that what you're thinking? Support was added in 6.4 (#32172) |
Pinging @elastic/es-search |
Alias data type is template/index level, not document level. |
@vbohata Lucene ( underlying storage Engine) organizes data by fields. So, there is a disk block corresponding to one field for all documents, and there is another disck block correspoding to another field for all documents. We can't tell Lucene for the same field depending on a document number read data from one block or another. |
OK. So for this feature to be implemented there has to be either some support in Lucene or some kind of intermediate layer in Elasticsearch (somefield in ES -> somefield_1, somefield_2, somefield_dedup in Lucene ... this way it would be globally configured per index but used independently for each document). |
As @mayya-sharipova pointed out, with the way data is organized this feature would be difficult to fit in the current architecture. The recommended approach in your set-up is either to duplicate the data (as you are doing with Logstash), or to perform a reindex and use field aliases for backwards compatibility. Thanks @vbohata for your suggestion -- if there is more interest in this request or something changes in our thinking, we can re-open the issue. |
We often face the situation of renaming some fields while keeping the old named fields for some time. This will be much worse as ECS will gets more widely used.
Sometimes it is possible to solve it by aliases, sometimes it is not - situations like "dynamic data" or if needed to do it in already existent index without reindexing/creating new one. In these situations we copy fields in logstash which results to bigger indexes with duplicated data.
Elasticsearch could provide a feature similar to hard links in filesystem - allow to specify links to fields for each single document. So indexing like field1:"somevalue", field2:linkto:field1. Or it could be fully automatic - if there are 4 fields with the same data type and exactly same value, store it just once. This feature would help to solve many troubles we are facing to.
The text was updated successfully, but these errors were encountered: