-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collapse field types into "families" of field types in _field_caps
.
#53175
Comments
Pinging @elastic/es-search (:Search/Mapping) |
Having some type of "category" would be useful however I'm not sure whether in the end it would be useful across all consumers. For example, if a field is mapped as a |
I have mixed feelings on this. We're doing everything so that field types are as transparent as possible, e.g.
How do you do it in SQL today? |
In SQL, we depend on field_caps almost entirely and while Note: With numeric fields, apart from the scenario above where we do not make them "equal" in any case, knowing which type is which is important when we take the |
When you say that kibana performs this kind of merge, I think you're talking about kibana index patterns. I'm curious if there are other parts of kibana in which the specific field types are important. Here's the related kibana code - https://github.com/elastic/kibana/blob/master/src/plugins/data/common/kbn_field_types/kbn_field_types_factory.ts Would this change merge collapsible fields in such as way that we would no longer know the types comprising them? |
@mattkime If knowing the actual types is a requirement for you, it would be doable. Are there places where you need to know about the actual types today? |
i don't think we are interested in the actual elastic search type in kibana. am i missing something @timroes ? |
We're storing the actual ES types and so removing that would also be a breaking change. In terms of what we're using them for: I don't have a complete list, but I think it was mainly introduced for KQL, since we needed to make some decision between text and keyword fields for the way we build the query, but I can't find that specific code right now. Another example seems to be field formatters where we are using it. Even if we could solve those without, I think we'll always have the need. One example that would come to my mind is e.g. support for significant text aggregation (elastic/kibana#31614). We can only achieve that if we have the real ES types available, since our grouped "String" type is not enough and we also explicitly need to bypass I don't think removal of |
@timroes I don't think we could ever collapse This would be a breaking change indeed when we change it for existing types, and this won't happen until Elasticsearch 8.0. But we could potentially start doing it for fields that are not released yet such as |
Just to reiterate that SQL will still need to know the exact field types. |
@astefan For my understanding, what would be the problem with always returning |
And I have the feeling with that we already run into the problem I mentioned when we discussed that last time. If Elasticsearch makes those groupings we kind of put internal Kibana knowledge that you might not have into your decision. So let's use exactly those pairs you mentioned as an example. JavaScript doesn't have 64bit integers, so we cannot represent Another example is Meaning if we want to get rid of exact ES types in _field_caps I think we would need to have a shared sync for every field to know the impact on Kibana before we decide on families. And even that I don't think will work, since there might simply be future features for which we will need to know that difference, that we currently are not aware of, and might block us with our decision on. It happened in the past, that was the whole reason we needed to introduce storing the specific ES types, since we saw our family grouping blocked some things, and I don't see how it would be different if we put the family grouping into ES. So my strong recommendation would be, keep the actual field type information around for Kibana to store and use. |
@timroes I'm still unclear why this matters. For instance today, does it make any difference to Kibana if someone indexed integers in a long field vs. an integer field? I have a similar question regarding this special handling that you have for |
Currently for With regards to
I think none of them is anything that could be solveable in Elasticsearch. I btw fully support your base statement here: If any of those things where we need to make differences in Kibana based on the specific field type is solvable in ES, we should solve it in ES and not in Kibana. I def want to keep those places as low as possible, so far it just have shown not be 0, and as long as I can still "easily" make up exampled like integer vs long I feel worried that we'll always have cases where we need those specific types. |
My understanding is that we'd like to collapse field types together that have nearly identical read-time behavior -- and this includes not only searching but also loading/ displaying field values. But there's a question of what 'nearly identical behavior' actually means for the different field types. To me there are two different cases. First, it seems like we're hesitant to collapse together fields like In the second case, a small number of field types are designed to have the same external behavior as a 'standard' type, but give different performance trade-offs in terms of disk usage or speed. These field types could be presented as if they were the 'standard' field type:
Related to this case, we plan to introduce 'runtime fields' that are defined in the mappings, but calculate their values on-the-fly. For example there could be a runtime field that takes two concrete Would we at least be interested in collapsing field types in the second case? |
I think for the second case it could make sense collapsing them. So if it's really just performance optimizations behind the scenes, but otherwise behaving exactly the same, we could imho collapse them. I am just having concerns collapsing fields as long as they are not behaving exactly the same from an API point of view, and that could mean: different datatypes returned, they are not be able to be used in exactly the same APIs (and already having one very specific aggregation not being able to work with both, means for me we should not collapse them - or at least need the raw field information in that case), differences in getting access to them via docvalues, scripts, etc. If they are really behaving exactly the same in all those cases I don't see a problem (from Kibana side) to collapse them together, and even without us having the original field information. As soon as they are only 99.9% the same in regards to their API behavior, I'd not collapse them together, since I'll very likely be able to come up with an example where we would need the differentiation. So in your example, it sounds like we could merge |
FYI this is getting worked an these days in order to make both fields support exactly the same operations. |
Maybe we could think about this more as a trade-off instead of a rule 'as soon as they are different at all, we shouldn't collapse the types'. For example, a field alias on a |
@timroes @astefan with the
The |
Also tagging @markharwood for awareness on the above. |
That sounded fine to me, given that it behaves exactly the same, right?
Just for clarification from that issue in Kibana. Are there now differences in how it can be used in queries, and APIs (not performance differences, but APIs where it's simply not working if you use a
That might not be that good example as you think it is, since Aliases are supported really not well in Kibana, and we get very often negative feedback on how Aliases work (or don't) in Kibana (just haven't had the capacity yet to address this) :D |
@jtibshirani for "runtime" reporting of field data types in Collapsing those field types together means we'll tell users or to BI tools (via |
No there is not. Although, I think we should approach this differently. The intent for the
I wonder why you consider this as an issue ? Why would BI users care about ES mapping internals if they can rely on family types ? Knowing that the field is of family |
Float and scaled_float data types have different precisions (display column size). For the other three in question the precision is the same. And it's about the BI tools themselves which, in theory, look at the display size for numbers to know how many digits to display. For the text based data types in question, I think it's more of a conceptual decision (as to why I CCed @costin to get his input on this) to have inconsistent (or incorrect) information displayed as a result of a |
I can't think of a big disadvantage of this change in SQL. |
I think @jimczi's point is a really important one -- if we collapsed two field types into a 'family', then we would consider any significant difference in functionality to be a high-priority bug ES. Even beyond the field caps API, it's really nice to have this concept of a field type 'family' that's guaranteed to have the same behavior. I think it will help users understand all the new field types we're adding -- they can think of constant_keyword and wildcard as being variants on 'keyword' and be assured they support the same functionality. It sounds like there are concerns around collapsing
If there is still hesitation, please feel free to reply (or ping me directly) -- I think the next step would be to have a short discussion as a group. |
@jtibshirani one small apparently obvious aspect, just to make sure this is also covered: collapsing fields together into a "common" type should only be applied when merging fields, and not also when a single index is involved in the _field_caps request, right? |
@astefan, no the plan is to always return |
Thank you for clarifying. After all, it wasn't so obvious as I thought :-). |
…or field capabilities. Relates to elastic#53175
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type. Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities. Relates to #53175
…58315) Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type. Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities. Relates to elastic#53175
I've updated master and 7x branches with support for the |
Now that we've decided we're okay having 'field type families' in certain circumstances (and have a lightweight way to add new families thanks to @markharwood), I suggest we close this issue. To add future families, we can open dedicated issues/ PRs and discuss there. I also added a note to #57548 about adding the concept of a 'field type family' to our docs. I think it'd be helpful if we surfaced the relationship between |
Both Kibana and SQL have logic on top of the
_field_caps
API to detect when a conflict is harmful versus when it's not. Part of how it works is by grouping field types into families of fields: conflicts are fine as long as the types across indices are all of the same family.Should we move this logic to
_field_caps
instead?Here is an example of how we could collapse fields:
number
:long
,integer
,double
,float
,scaled_float
, ...date
:date
,date_nanos
keyword
:keyword
,constant_keyword
,wildcard
text
:text
geo
:geo_point
,geo_shape
Most other fields would be their own family as they don't share properties with other fields, e.g.
boolean
orflattened
.So for instance, the below response today is considered a conflict:
While it wouldn't be one if we collapsed
double
andfloat
into anumber
"family".The text was updated successfully, but these errors were encountered: