-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Field aliases #23714
Comments
++. We may also want to consider supporting these field aliases in the field_stats and the (I think) upcoming field_capability APIs. |
Discussed in FixItFriday and we said that we can see this as a useful feature for transitioning to a new field name but if the implementation is not as clean as it appears currently we should discuss this again since mappings are already complex and we should not do anything to increase that complexity too much |
If we are going to add (however much) complexity here, can we trade it for removal of some other complexity/leniency? One of the things that has bugged me for a long time is unmapped fields stuff. Could we be more strict about fields existing across indexes, and if there is not a concrete field, they can create an alias (maybe even have an "empty" type of alias which means "match anything against this field"). |
I really don't want to make this change contingent on anything else. It is a good solution in and of itself. Let's keep this issue on topic. |
Another (non-transitional) use case for this: we have a lot of examples where the master datasets for some indices/types define separate |
@rjernst, are you saying that if we had field aliases we could clean up the code around unmapped fields by mapping them to a field that explicitly doesn't index them? Or are you thinking of something at query time? @mrec I think this feature would work for what you want. As envisioned the fields would have to be the same types for this to work. @clintongormley I wonder if duplicating the old structure with ingest is a technically better solution. It'd work in all cases without requiring new code, but it isn't a thing you could do after the fact and it isn't free from a storage standpoint. |
@nik9000 Possibly. I just realized my original thought could be done now, regardless of aliases. That is, instead of having the current "unmapped fields" logic on the coordinating node, we could require adding a dummy empty field on older indexes for that name. |
@nik9000 the point is that with copy_to and ingest you need to reindex. I'm trying to solve the case where you are transitioning from field |
+1 For example, this would have helped Kibana users a lot if we had field aliases to handle the raw -> keyword field name changes between 2.x and 5.x. For there will be older indices with .raw references while newer indices will have .keyword. |
Further enhancement for this: Could we construct the alias so that it can consist of an 'OR' of two other aliases? First name/last name in individual aliases with 'Name' as a separate alias that could hit on either would be helpful in some use cases. |
No, this introduces a huge amount of complexity, eg we silently need to be able to upgrade single-field queries to compound queries when run against multi-field aliases. |
Yet another breaking change in our stack that is going to make this feature useful. In 6.0+, filebeat stops using input_type output field for prospectors and has renamed it to prospector.type. That means Kibana users who have visualizations against input_type will have to handle this field name change challenge when querying both older and newer indices as part of the upgrade. |
We have various types of log files stored in indexes with different names for the timestamp field. Field aliases would be a great way to map the timestamp field to a common alias which I could use for sorting and aggregations. The alternative solution for the above is to either 1) copy the field to an unified field (requires more storage and re-indexing), or 2) use scripting for sorting or aggregations which makes the query much slower. I think field aliases are an elegant solution to make queries across indexes more useful and faster. |
@clintongormley Why isn't renaming a field directly possible? |
Renaming a field would require to re-indexing the data and it the integrations (e.g. dashboards) with the old indexes. For my use case, I am creating a "view" across indexes and try to combine two fields that have a different name. Elasticsearch is using filtered index aliases for views across indexes. Would it be maybe easier to support field aliases as part of filtered index aliases instead of adding them directly to the mapping? |
@josefschiefer I am assuming you are responding to my question. If so, it wasn't directed at you. I should have put @clintongormley to make it more clear. Sorry about that. |
@rpedela What @josefsalyer said is correct and is the reason why renaming a field directly is not possible. This is an open community and we welcome anyone to respond to any questions asked on issues, thank you @josefsalyer for taking the time to respond. Once Lucene segments are written they are never modified so the only way to change all values for an entire field is to re-index. You can do this already but this issue is trying to come up with a solution for when re-indexing is not feasible or for the period until a re-index can be done. |
@colings86 I know you need to reindex currently, but why is that the case? Why is the field name set in stone within the Lucene segment? Does the field name have to be set in stone? |
@rpedela It is set in stone because Lucene works in an append only way. It never modifies files and only ever added new files. This is why when you update a document you actually delete the document and create a new document. It is also why when you delete a document it is actually only marks as deleted in that segment and the actual delete is deferred until the segment is merged (if it is ever merged). This principle is important in Lucene as it is makes the segments work well with the OS filesystem cache which keeps searches fast. Because of the above, in order to rename a field, we would need to rewrite every segment in the index to change the field name in that segment (effectively we would need to delete every document in the index, and re-index the document with the new field name), this is the same as re-indexing the whole index so it doesn't really buy anything. |
@josefsalyer It wasn't my intent to make you feel unwelcome. I apologize. @colings86 According to the Lucene docs, the field names are stored in a FNM file and the names are mapped to numbers. If I am understanding correctly, the field number is used throughout the other files to reference a field rather than the name. Is it possible to modify that file? There is an old, open issue with a patch that does just that, however it screwed up ordering. However the quote below in the latest docs suggest ordering should no longer be a problem.
If modifying the FNM file is not possible, can ES store its own field name mapping? A mapping between the source's name and some immutable, unique name generated by ES? The ES name is used in Lucene and in rest of the ES codebase, and renaming a field is just updating that mapping. I currently do this myself using a Postgres table so I can avoid reindexing. |
@rpedela while what you say is correct, the situation is more complex than that. The field name doesn't only exist in the mapping and in Lucene, it also exists in the On top of that, if the index is still accepting new documents or changes, it is likely that those new documents would use the old field name, so now you end up with two fields... This is why a field alias seems the better route to me. |
@clintongormley That is a good point regarding modifying the FNM file. In the case where ES stores a mapping, couldn't the Some people on this thread have voiced use cases for alias other than renaming which suggests it is a good idea. However from a user's perspective, aliases specifically for renaming seems more complicated and less intuitive than a |
@rpedela what happens if you rename a field, then add a new field with the old name and perhaps a different mapping? Now you have a conflict. Aliases prevent that. |
@clintongormley Why would there be a conflict? Let's map this out.
The |
That's not how the _source works. The source field is an untouched copy of the JSON document you index. Having to change the source to have a layer of redirection between "virtual" field names and the field names stored and returned from the source would have a huge overhead (plus would introduce a hundred possible bugs thanks to the added complexity). That just ain't gonna happen :) |
Fair enough. Thanks for listening. |
No problem :) Thanks for bringing up the idea |
cc @elastic/es-search-aggs |
This may belong in a separate issue, but I'll start here: Similar to the all_fields execution mode, another use-case for field aliases might be as an alternative to using copy_to to create custom all fields on earlier versions.
This would let users choose between index-time cost of copy_to and the search time costs of constructing the MultiMatchQuery as all_fields does. |
@reardencode thanks for the suggestion, we had talked about something similar to that but we would prefer to keep things simple, at least for the first version of this feature and restrict aliases to only point to a single concrete field. This makes the logic a simple substitution of the field name rather than requiring us to produce a boolean query with all the various aliased fields in. |
Is there any particular reason that aliases must exist as a property (mappings/my_type/properties) rather than as a new sibling of "mappings/my_type/properties", something like.. PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"host": {
"properties": {
"source_ip": {
"type": "ip"
}
}
}
},
"property-aliases" {
"sourceIP": {
"path": "host.source_ip"
}
}
}
}
} Im thinking this would make for a smaller change, as all the places that loop over properties would remain unchanged and wouldnt need to know about type=alias and do special things, even if it is skipping. This would then mean aliases are only ever considered in the code that needs to build a query (etc). Naturally the code would still enforce that aliases must point to real properties, and there must be no clash between aliases and properties even if they are in different parts of the mapping json graph. |
answer to self, my proposal would probably require updates to the mappings loader and saver and verifier to handle the new "properties-aliases" branch. Better to stick to aliases being a new type under properties. |
In elastic/elasticsearch#23714 Elasticsearch implemented the alias field type. This can be used in fields.yml as following: ``` - name: a.b type: alias path: a.c ``` `a.b` will be the alias for `a.c`.
In elastic/elasticsearch#23714 Elasticsearch implemented the alias field type. This can be used in fields.yml as following: ``` - name: a.b type: alias path: a.c ``` `a.b` will be the alias for `a.c`.
It is hard to rename a field when using time-based indices - search and especially aggregations will only work on either the new or the old version, but there is a transition period where not all data will be seen.
We can introduce a new field type called
alias
which simply points to another field, eg:This field type would work as follows:
doc[]
), highlighting, fielddata_fields, docvalue_fields, stored_fields would just get the data (and mapping) from the specified pathThis also works for users who want to expose a nicer name for fields in Kibana
The text was updated successfully, but these errors were encountered: