Field aliases #23714

clintongormley · 2017-03-23T14:52:03Z

It is hard to rename a field when using time-based indices - search and especially aggregations will only work on either the new or the old version, but there is a transition period where not all data will be seen.

We can introduce a new field type called alias which simply points to another field, eg:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "host": {
          "properties": {
            "source_ip": {
              "type": "ip"
            }
          }
        },
        "sourceIP": {
          "type": "alias",
          "path": "host.source_ip"
        }
      }
    }
  }
}

This field type would work as follows:

Attempts to index into the alias field would result in an exception - it is read only
Queries, aggs, suggestions, scripts (using doc[]), highlighting, fielddata_fields, docvalue_fields, stored_fields would just get the data (and mapping) from the specified path
Source filtering would not work with the aliased field

This also works for users who want to expose a nicer name for fields in Kibana

The text was updated successfully, but these errors were encountered:

skearns64 · 2017-03-23T15:01:06Z

++.

We may also want to consider supporting these field aliases in the field_stats and the (I think) upcoming field_capability APIs.

colings86 · 2017-03-24T13:13:54Z

Discussed in FixItFriday and we said that we can see this as a useful feature for transitioning to a new field name but if the implementation is not as clean as it appears currently we should discuss this again since mappings are already complex and we should not do anything to increase that complexity too much

rjernst · 2017-03-24T18:48:26Z

If we are going to add (however much) complexity here, can we trade it for removal of some other complexity/leniency? One of the things that has bugged me for a long time is unmapped fields stuff. Could we be more strict about fields existing across indexes, and if there is not a concrete field, they can create an alias (maybe even have an "empty" type of alias which means "match anything against this field").

clintongormley · 2017-03-28T14:56:01Z

I really don't want to make this change contingent on anything else. It is a good solution in and of itself. Let's keep this issue on topic.

mrec · 2017-04-07T14:16:56Z

Another (non-transitional) use case for this: we have a lot of examples where the master datasets for some indices/types define separate FooVariant1 and FooVariant2 fields while others don't make a distinction and so only have data for one of these. To support consistent searching across indices we currently use copy_to rules, but the data duplication is obviously wasteful; zero-cost aliases like this would be much nicer.

nik9000 · 2017-04-07T14:32:37Z

If we are going to add (however much) complexity here, can we trade it for removal of some other complexity/leniency? One of the things that has bugged me for a long time is unmapped fields stuff. Could we be more strict about fields existing across indexes, and if there is not a concrete field, they can create an alias (maybe even have an "empty" type of alias which means "match anything against this field").

@rjernst, are you saying that if we had field aliases we could clean up the code around unmapped fields by mapping them to a field that explicitly doesn't index them? Or are you thinking of something at query time?

@mrec I think this feature would work for what you want. As envisioned the fields would have to be the same types for this to work.

@clintongormley I wonder if duplicating the old structure with ingest is a technically better solution. It'd work in all cases without requiring new code, but it isn't a thing you could do after the fact and it isn't free from a storage standpoint. copy_to is a similar thing, but with less space used and working in fewer contexts. Both are more flexible than an alias in that you can define different mappings for the field so you can handle the cases where you changed mappings. I wonder if we're better off documenting a few "recipes" for migrating fields over time.

rjernst · 2017-04-10T22:22:45Z

are you saying that if we had field aliases we could clean up the code around unmapped fields by mapping them to a field that explicitly doesn't index them? Or are you thinking of something at query time?

@nik9000 Possibly. I just realized my original thought could be done now, regardless of aliases. That is, instead of having the current "unmapped fields" logic on the coordinating node, we could require adding a dummy empty field on older indexes for that name.

clintongormley · 2017-05-05T13:12:11Z

@clintongormley I wonder if duplicating the old structure with ingest is a technically better solution. It'd work in all cases without requiring new code, but it isn't a thing you could do after the fact and it isn't free from a storage standpoint. copy_to is a similar thing, but with less space used and working in fewer contexts. Both are more flexible than an alias in that you can define different mappings for the field so you can handle the cases where you changed mappings. I wonder if we're better off documenting a few "recipes" for migrating fields over time.

@nik9000 the point is that with copy_to and ingest you need to reindex. I'm trying to solve the case where you are transitioning from field Foo in old indices to field foo in new indices, and you want to be able to run aggs or searches across old and new indices. A field alias can be added after the fact (when you realise you have a problem) at zero cost.

ppf2 · 2017-05-05T16:35:14Z

+1 For example, this would have helped Kibana users a lot if we had field aliases to handle the raw -> keyword field name changes between 2.x and 5.x. For there will be older indices with .raw references while newer indices will have .keyword.

seang-es · 2017-06-22T18:48:50Z

Further enhancement for this: Could we construct the alias so that it can consist of an 'OR' of two other aliases? First name/last name in individual aliases with 'Name' as a separate alias that could hit on either would be helpful in some use cases.

clintongormley · 2017-06-29T17:20:54Z

Could we construct the alias so that it can consist of an 'OR' of two other aliases? First name/last name in individual aliases with 'Name' as a separate alias that could hit on either would be helpful in some use cases.

No, this introduces a huge amount of complexity, eg we silently need to be able to upgrade single-field queries to compound queries when run against multi-field aliases.

ppf2 · 2017-12-13T20:09:00Z

Yet another breaking change in our stack that is going to make this feature useful. In 6.0+, filebeat stops using input_type output field for prospectors and has renamed it to prospector.type. That means Kibana users who have visualizations against input_type will have to handle this field name change challenge when querying both older and newer indices as part of the upgrade.

josefschiefer · 2017-12-14T05:47:06Z

We have various types of log files stored in indexes with different names for the timestamp field. Field aliases would be a great way to map the timestamp field to a common alias which I could use for sorting and aggregations.

The alternative solution for the above is to either 1) copy the field to an unified field (requires more storage and re-indexing), or 2) use scripting for sorting or aggregations which makes the query much slower.

I think field aliases are an elegant solution to make queries across indexes more useful and faster.

rpedela · 2018-01-11T17:44:20Z

@clintongormley Why isn't renaming a field directly possible?

josefschiefer · 2018-01-11T21:37:58Z

Renaming a field would require to re-indexing the data and it the integrations (e.g. dashboards) with the old indexes.

For my use case, I am creating a "view" across indexes and try to combine two fields that have a different name. Elasticsearch is using filtered index aliases for views across indexes. Would it be maybe easier to support field aliases as part of filtered index aliases instead of adding them directly to the mapping?

rpedela · 2018-01-11T21:57:38Z

@josefschiefer I am assuming you are responding to my question. If so, it wasn't directed at you. I should have put @clintongormley to make it more clear. Sorry about that.

colings86 · 2018-01-12T08:28:44Z

@rpedela What @josefsalyer said is correct and is the reason why renaming a field directly is not possible. This is an open community and we welcome anyone to respond to any questions asked on issues, thank you @josefsalyer for taking the time to respond. Once Lucene segments are written they are never modified so the only way to change all values for an entire field is to re-index. You can do this already but this issue is trying to come up with a solution for when re-indexing is not feasible or for the period until a re-index can be done.

rpedela · 2018-01-12T08:49:24Z

@colings86 I know you need to reindex currently, but why is that the case? Why is the field name set in stone within the Lucene segment? Does the field name have to be set in stone?

colings86 · 2018-01-12T09:20:00Z

@rpedela It is set in stone because Lucene works in an append only way. It never modifies files and only ever added new files. This is why when you update a document you actually delete the document and create a new document. It is also why when you delete a document it is actually only marks as deleted in that segment and the actual delete is deferred until the segment is merged (if it is ever merged). This principle is important in Lucene as it is makes the segments work well with the OS filesystem cache which keeps searches fast.

Because of the above, in order to rename a field, we would need to rewrite every segment in the index to change the field name in that segment (effectively we would need to delete every document in the index, and re-index the document with the new field name), this is the same as re-indexing the whole index so it doesn't really buy anything.

rpedela · 2018-01-12T09:50:51Z

@josefsalyer It wasn't my intent to make you feel unwelcome. I apologize.

@colings86 According to the Lucene docs, the field names are stored in a FNM file and the names are mapped to numbers. If I am understanding correctly, the field number is used throughout the other files to reference a field rather than the name. Is it possible to modify that file? There is an old, open issue with a patch that does just that, however it screwed up ordering. However the quote below in the latest docs suggest ordering should no longer be a problem.

FieldNumber: the field's number. Note that unlike previous versions of Lucene, the fields are not numbered implicitly by their order in the file, instead explicitly.

If modifying the FNM file is not possible, can ES store its own field name mapping? A mapping between the source's name and some immutable, unique name generated by ES? The ES name is used in Lucene and in rest of the ES codebase, and renaming a field is just updating that mapping. I currently do this myself using a Postgres table so I can avoid reindexing.

clintongormley · 2018-01-12T09:54:05Z

@rpedela while what you say is correct, the situation is more complex than that. The field name doesn't only exist in the mapping and in Lucene, it also exists in the _source. There would be no way to change that without reindexing every document.

On top of that, if the index is still accepting new documents or changes, it is likely that those new documents would use the old field name, so now you end up with two fields...

This is why a field alias seems the better route to me.

rpedela · 2018-01-12T10:20:32Z

@clintongormley That is a good point regarding modifying the FNM file.

In the case where ES stores a mapping, couldn't the _source also be modified to use the ES-generated names? Then when the _source is returned to the user, it is modified again based on the mapping. Iterating through JSON keys is pretty fast so it shouldn't be a performance issue. I also can't think of any weird edge cases where the index would be out of sync. Like I said previously, I do exactly that myself and I haven't noticed any problems.

Some people on this thread have voiced use cases for alias other than renaming which suggests it is a good idea. However from a user's perspective, aliases specifically for renaming seems more complicated and less intuitive than a _rename API. And actually the mapping idea is basically an alias, but it is hidden from the user. In other words, I think we may agree on the solution. I just disagree on the API.

clintongormley · 2018-01-12T10:23:26Z

@rpedela what happens if you rename a field, then add a new field with the old name and perhaps a different mapping? Now you have a conflict. Aliases prevent that.

rpedela · 2018-01-12T10:34:58Z

@clintongormley Why would there be a conflict? Let's map this out.

foo is indexed as es_field_1 inside _source and the Lucene segment.
foo is renamed to bar in the mapping, but es_field_1 is still used for _source and the Lucene segment.
A new foo is added and indexed as es_field_2 inside _source and the Lucene segment. And bar still points to es_field_1.

The es_field_* are immutable and increment as fields are added.

clintongormley · 2018-01-12T10:51:17Z

That's not how the _source works. The source field is an untouched copy of the JSON document you index. Having to change the source to have a layer of redirection between "virtual" field names and the field names stored and returned from the source would have a huge overhead (plus would introduce a hundred possible bugs thanks to the added complexity).

That just ain't gonna happen :)

rpedela · 2018-01-12T10:57:36Z

Fair enough. Thanks for listening.

clintongormley · 2018-01-12T11:21:09Z

No problem :) Thanks for bringing up the idea

jpountz · 2018-03-14T08:39:22Z

cc @elastic/es-search-aggs

reardencode · 2018-03-27T23:49:49Z

This may belong in a separate issue, but I'll start here: Similar to the all_fields execution mode, another use-case for field aliases might be as an alternative to using copy_to to create custom all fields on earlier versions.

"properties": {
  "all_ips": {
    "type": "alias",
    "paths": ["source_ip", "dest_ip"]
  }
}

This would let users choose between index-time cost of copy_to and the search time costs of constructing the MultiMatchQuery as all_fields does.

colings86 · 2018-03-28T08:19:46Z

@reardencode thanks for the suggestion, we had talked about something similar to that but we would prefer to keep things simple, at least for the first version of this feature and restrict aliases to only point to a single concrete field. This makes the logic a simple substitution of the field name rather than requiring us to produce a boolean query with all the various aliased fields in.

mP1 · 2018-04-16T13:38:47Z

@clintongormley

Is there any particular reason that aliases must exist as a property (mappings/my_type/properties) rather than as a new sibling of "mappings/my_type/properties", something like..

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "host": {
          "properties": {
            "source_ip": {
              "type": "ip"
            }
          }
        }
      },
      "property-aliases" {
        "sourceIP": {
          "path": "host.source_ip"
        }
      }
    }
  }
}

Im thinking this would make for a smaller change, as all the places that loop over properties would remain unchanged and wouldnt need to know about type=alias and do special things, even if it is skipping.

This would then mean aliases are only ever considered in the code that needs to build a query (etc).

Naturally the code would still enforce that aliases must point to real properties, and there must be no clash between aliases and properties even if they are in different parts of the mapping json graph.

mP1 · 2018-04-17T08:56:02Z

answer to self, my proposal would probably require updates to the mappings loader and saver and verifier to handle the new "properties-aliases" branch. Better to stick to aliases being a new type under properties.

In elastic/elasticsearch#23714 Elasticsearch implemented the alias field type. This can be used in fields.yml as following: ``` - name: a.b type: alias path: a.c ``` `a.b` will be the alias for `a.c`.

clintongormley added :Search Foundations/Mapping Index mappings, including merging and defining field types discuss >feature labels Mar 23, 2017

clintongormley added help wanted adoptme and removed discuss labels Mar 24, 2017

jasontedor added the high hanging fruit label May 5, 2017

clintongormley mentioned this issue Oct 9, 2017

Field Aliases #17511

Closed

chrisronline mentioned this issue Feb 5, 2018

Allow users to specify display names for fields elastic/kibana#1896

Closed

colings86 removed the high hanging fruit label Apr 11, 2018

mP1 mentioned this issue Apr 28, 2018

New field type=alias including support for querying, aggs, suggestions + more read ops #30230

Closed

This was referenced Jun 13, 2018

Add basic support for field aliases in index mappings. #31287

Merged

Field aliases implementation tracking #31372

Closed

jtibshirani removed the help wanted adoptme label Jun 28, 2018

jtibshirani mentioned this issue Jul 18, 2018

Add support for field aliases. #32172

Merged

jtibshirani closed this as completed in #32172 Jul 18, 2018

rayafratkina mentioned this issue Jul 18, 2018

Support for field aliases in Kibana elastic/kibana#20946

Closed

1 task

ruflin mentioned this issue Jul 19, 2018

Add support for alias field type to fields.yml elastic/beats#7645

Merged

jtibshirani self-assigned this Jul 26, 2018

bhavyarm mentioned this issue Aug 7, 2018

Discover doesn't show field aliases elastic/kibana#21705

Closed

bhavyarm mentioned this issue Aug 20, 2018

Automate test for field aliases elastic/kibana#22189

Open

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Field aliases #23714

Field aliases #23714

clintongormley commented Mar 23, 2017

skearns64 commented Mar 23, 2017

colings86 commented Mar 24, 2017

rjernst commented Mar 24, 2017

clintongormley commented Mar 28, 2017

mrec commented Apr 7, 2017

nik9000 commented Apr 7, 2017

rjernst commented Apr 10, 2017

clintongormley commented May 5, 2017

ppf2 commented May 5, 2017

seang-es commented Jun 22, 2017

clintongormley commented Jun 29, 2017

ppf2 commented Dec 13, 2017

josefschiefer commented Dec 14, 2017 •

edited

Loading

rpedela commented Jan 11, 2018 •

edited

Loading

josefschiefer commented Jan 11, 2018

rpedela commented Jan 11, 2018

colings86 commented Jan 12, 2018 •

edited

Loading

rpedela commented Jan 12, 2018 •

edited

Loading

colings86 commented Jan 12, 2018

rpedela commented Jan 12, 2018

clintongormley commented Jan 12, 2018

rpedela commented Jan 12, 2018 •

edited

Loading

clintongormley commented Jan 12, 2018

rpedela commented Jan 12, 2018 •

edited

Loading

clintongormley commented Jan 12, 2018

rpedela commented Jan 12, 2018

clintongormley commented Jan 12, 2018

jpountz commented Mar 14, 2018

reardencode commented Mar 27, 2018

colings86 commented Mar 28, 2018

mP1 commented Apr 16, 2018 •

edited

Loading

mP1 commented Apr 17, 2018

Field aliases #23714

Field aliases #23714

Comments

clintongormley commented Mar 23, 2017

skearns64 commented Mar 23, 2017

colings86 commented Mar 24, 2017

rjernst commented Mar 24, 2017

clintongormley commented Mar 28, 2017

mrec commented Apr 7, 2017

nik9000 commented Apr 7, 2017

rjernst commented Apr 10, 2017

clintongormley commented May 5, 2017

ppf2 commented May 5, 2017

seang-es commented Jun 22, 2017

clintongormley commented Jun 29, 2017

ppf2 commented Dec 13, 2017

josefschiefer commented Dec 14, 2017 • edited Loading

rpedela commented Jan 11, 2018 • edited Loading

josefschiefer commented Jan 11, 2018

rpedela commented Jan 11, 2018

colings86 commented Jan 12, 2018 • edited Loading

rpedela commented Jan 12, 2018 • edited Loading

colings86 commented Jan 12, 2018

rpedela commented Jan 12, 2018

clintongormley commented Jan 12, 2018

rpedela commented Jan 12, 2018 • edited Loading

clintongormley commented Jan 12, 2018

rpedela commented Jan 12, 2018 • edited Loading

clintongormley commented Jan 12, 2018

rpedela commented Jan 12, 2018

clintongormley commented Jan 12, 2018

jpountz commented Mar 14, 2018

reardencode commented Mar 27, 2018

colings86 commented Mar 28, 2018

mP1 commented Apr 16, 2018 • edited Loading

mP1 commented Apr 17, 2018

josefschiefer commented Dec 14, 2017 •

edited

Loading

rpedela commented Jan 11, 2018 •

edited

Loading

colings86 commented Jan 12, 2018 •

edited

Loading

rpedela commented Jan 12, 2018 •

edited

Loading

rpedela commented Jan 12, 2018 •

edited

Loading

rpedela commented Jan 12, 2018 •

edited

Loading

mP1 commented Apr 16, 2018 •

edited

Loading