Add support for value aliases/data deduplication #39758

vbohata · 2019-03-06T17:22:40Z

We often face the situation of renaming some fields while keeping the old named fields for some time. This will be much worse as ECS will gets more widely used.
Sometimes it is possible to solve it by aliases, sometimes it is not - situations like "dynamic data" or if needed to do it in already existent index without reindexing/creating new one. In these situations we copy fields in logstash which results to bigger indexes with duplicated data.

Elasticsearch could provide a feature similar to hard links in filesystem - allow to specify links to fields for each single document. So indexing like field1:"somevalue", field2:linkto:field1. Or it could be fully automatic - if there are 4 fields with the same data type and exactly same value, store it just once. This feature would help to solve many troubles we are facing to.

polyfractal · 2019-03-06T18:43:57Z

I think you can accomplish this with the new(ish) alias data type: https://www.elastic.co/guide/en/elasticsearch/reference/master/alias.html

These are essentially field aliases that can be added to an existing mapping, pointing at an existing field. Is that what you're thinking?

Support was added in 6.4 (#32172)

elasticmachine · 2019-03-06T18:44:16Z

Pinging @elastic/es-search

vbohata · 2019-03-06T19:38:15Z

Alias data type is template/index level, not document level.

mayya-sharipova · 2019-03-07T18:18:41Z

@vbohata Lucene ( underlying storage Engine) organizes data by fields. So, there is a disk block corresponding to one field for all documents, and there is another disck block correspoding to another field for all documents. We can't tell Lucene for the same field depending on a document number read data from one block or another.
So, if I understood you correctly, what you are asking can't be done.

vbohata · 2019-04-07T00:44:15Z

OK. So for this feature to be implemented there has to be either some support in Lucene or some kind of intermediate layer in Elasticsearch (somefield in ES -> somefield_1, somefield_2, somefield_dedup in Lucene ... this way it would be globally configured per index but used independently for each document).

jtibshirani · 2019-04-10T21:43:32Z

As @mayya-sharipova pointed out, with the way data is organized this feature would be difficult to fit in the current architecture. The recommended approach in your set-up is either to duplicate the data (as you are doing with Logstash), or to perform a reindex and use field aliases for backwards compatibility.

Thanks @vbohata for your suggestion -- if there is more interest in this request or something changes in our thinking, we can re-open the issue.

$@polyfractal$ polyfractal added feedback_needed :Search Foundations/Mapping Index mappings, including merging and defining field types labels Mar 6, 2019

jtibshirani closed this as completed Apr 10, 2019

jtibshirani added >feature and removed feedback_needed labels Apr 11, 2019

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for value aliases/data deduplication #39758

Add support for value aliases/data deduplication #39758

vbohata commented Mar 6, 2019

polyfractal commented Mar 6, 2019

elasticmachine commented Mar 6, 2019

vbohata commented Mar 6, 2019

mayya-sharipova commented Mar 7, 2019 •

edited

Loading

vbohata commented Apr 7, 2019

jtibshirani commented Apr 10, 2019 •

edited

Loading

Add support for value aliases/data deduplication #39758

Add support for value aliases/data deduplication #39758

Comments

vbohata commented Mar 6, 2019

polyfractal commented Mar 6, 2019

elasticmachine commented Mar 6, 2019

vbohata commented Mar 6, 2019

mayya-sharipova commented Mar 7, 2019 • edited Loading

vbohata commented Apr 7, 2019

jtibshirani commented Apr 10, 2019 • edited Loading

mayya-sharipova commented Mar 7, 2019 •

edited

Loading

jtibshirani commented Apr 10, 2019 •

edited

Loading