Semantic data for fields #17087

timroes · 2018-03-10T15:28:51Z

I think adding more semantic information about fields in your index, could provide us quite some benefits. Currently our only semantic information we are having about a field, is it's type. But a user might know way more about the data inside that field, than just it's type. Some information that a user knowing the data could provide:

Ranges of the data - e.g. a numeric field could contain output from an industrial sensor and would always be between 0.0 and 1.0.
Step size of numeric values - e.g. this value will always be a multiple of 5
Enum information - a field of type string, could actually just contain a few distinct values (like error, warning, info) providing that semantic information improves over doing terms aggregation, since not all possible values might yet be existing in the data.
A semantic information/type of the value - e.g. that field contains an ISO2 country code or this field actually contains IATA airport codes
Relation between fields are one type of semantical information (see [Lens] Introduce the concept of "related fields" to guide end-users #73152)
...

Some examples of where these semantic information could be used and come in handy:

If we know which fields contain an ISO2 code, highlight or limit to these fields in Region maps.
Use ranges or step sizes of fields to limit an input control on that field.
Enable limiting the axis of a chart to the actual possible data range.
Better auto completion for KQL or filter generation, out of enum information or range information, maybe we could even generate way more advanced filter UI like "this field contains an ISO2 code, so select from all countries, but we also show country names, for easier selection"
...

I see two possible ways of implementing that:

Store the information in the _meta field of the mapping

The advantage here would be, that this information is directly sitting "besides" the data. That way the data, thus the person responsible for defining the data mapping anyway would also have full control of semantic information. It also would be available to all tools not only Kibana.

The disadvantage is, that this would lack a good UI to configure it, make it harder to configure it later on and also a lot of people are working with dynamic field mappings, which might make this more complex.

Store the information in the index pattern

That way we could directly provide an UI when configuring an index pattern to select the semantic information. Would make it way easier to configure and change that configuration later on.

The hugest disadvantage here would be, that this information would purely be available for that single Kibana instance and not shared between different tools.

/cc @trevan @chrisronline

The text was updated successfully, but these errors were encountered:

timroes · 2018-09-21T11:32:31Z

A possible use-case is described in #22486

nreese · 2019-02-28T21:55:25Z

Another use-case for index-pattern semantic data is identifying a default geospatial field for an index-pattern.

It would be helpful to apply a geospatial filter to dashboard and have the filter applied to the default geospatial field for each embeddable's index-pattern(s). Without a default geospatial field, users have to ensure all indices use the same field name for the geospatial field so the query can be applied across disparate indices.

timroes · 2019-05-02T08:59:28Z

cc @mattkime @stacey-gammon @lukeelmers @ppisljar Basically that topic comes up every other week as a solution to some of the problems we're having that we can't automatically calculate (like whether a numeric field is actually a continuous number or identifiers (like user ids)). Since Matt you're anyway rethinking index patterns at the moment I just wanted to raise awareness for that issue again.

elasticmachine · 2019-05-03T15:59:04Z

Pinging @elastic/kibana-app-arch

rayafratkina · 2019-08-14T00:31:48Z

How would this relate to ECS? Would that simply be one source of this data?

timroes · 2019-08-14T07:09:05Z

Imho we could build a "shortcut" for ECS data, so that if we detect an ECS index (however we would do that exactly), we can set all the semantic information in the index pattern, according to our knowledge about ECS. But ECS is way more generic than what this suggestion here would allow, so a user could still finetune their index pattern potentially to enrich it, even if it's ECS, with more semantic information to get more benefit out of their data.

wylieconlon · 2020-11-02T19:49:14Z

We now have an agreement to have standard metadata for numeric fields in the stack, which is a good place to start. It's not yet part of ECS but it will soon be part of many solutions.

mfinkle · 2021-09-15T18:21:19Z

This feels like there is overlap with the new Aggregated View in Discover, powered by the ML system.

timroes · 2021-09-20T21:08:58Z

This feels like there is overlap with the new Aggregated View in Discover, powered by the ML system.

@mfinkle could you please elaborate a bit on where you see the overlap? I think this issue was mostly about having the ability to define semantic data which we can't automatically determine to fields to improve the UI in several places. I'm personally not seeing a huge overlap with the data visualizer from ML?

mfinkle · 2021-09-21T03:11:41Z

I was just assuming that the underlying code in the Data Visualizer had to run some basic descriptive statistics on the fields, some of which might also be able to used or extended to find ranges, step sizes, and maybe potential enum-able fields.

timroes · 2021-09-21T07:23:56Z

There are some automatic pulled information about e.g. the min/max/median value. What I described with "semantic data" here is actually exactly the things that are not able to be determined automatically. I'll give a couple of examples:

We could have a sensor logging values always between 0 and 255. This is semantic data I find useful to have, and we could e.g. adjust our filter editors simply by a range slider than. But that doesn't mean that the minimum value actually is 0 and the maximum 255. The sensor could so far have logged data only between 42.2345 and 129.89 (which is then the only thing we can automatically determine). I still think we have a lot of benefits knowing that the actual value range is 0 to 255, but there's no way to determine that automatically, it's simply knowledge only the "user" (i.e. creator of the data) has.
A field might have values of 2, 4, 6. We might determine a step size of 2, while the actual step size is 1, we simply never had the values in between logged.

I believe when we can automatically determine information about a fields, we should do so (and already partially do). Besides that there are still a log ot use-cases left, where we can't determine the semantic of the field automatically, but would still benefit from knowing it and could improve our UX for those fields.

ppisljar · 2022-08-09T14:51:14Z

Thank you for contributing to this issue, however, we are closing this issue due to inactivity as part of a backlog grooming effort. If you believe this feature/bug should still be considered, please reopen with a comment.

timroes added :Management Meta triage_needed enhancement New value added to drive a business result labels Mar 10, 2018

chrisronline removed the triage_needed label Mar 12, 2018

timroes mentioned this issue Sep 10, 2018

New field metadata - Entity and role types #22486

Closed

timroes added Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Kibana Management Feature label for Data Views, Advanced Setting, Saved Object management pages and removed :Management DO NOT USE labels Nov 27, 2018

lukeelmers added :AppArch Feature:Data Views Data Views code and UI - index patterns before 8.0 labels May 3, 2019

mattkime mentioned this issue May 3, 2019

[DISCUSS] Rethinking index patterns; improvements #35481

Closed

timroes removed the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Oct 5, 2020

timroes mentioned this issue Feb 8, 2021

Add Data field descriptions #89726

Closed

TinaHeiligers mentioned this issue Feb 18, 2021

[Usage Collection] Add description field to schema #89685

Closed

timroes mentioned this issue Jun 21, 2021

Add default field format based on field name #6563

Closed

exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Jun 21, 2021

ghudgins mentioned this issue Sep 14, 2021

[Lens] Introduce the concept of "related fields" to guide end-users #73152

Closed

exalate-issue-sync bot added loe:medium Medium Level of Effort and removed loe:small Small Level of Effort labels Oct 7, 2021

exalate-issue-sync bot added loe:small Small Level of Effort and removed loe:medium Medium Level of Effort labels Nov 19, 2021

ppisljar closed this as not planned Won't fix, can't repro, duplicate, stale Aug 9, 2022

exalate-issue-sync bot closed this as completed Aug 16, 2022

wayneseymour mentioned this issue Jan 31, 2024

[FTR] Refactor toasts svc #174222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic data for fields #17087

Semantic data for fields #17087

timroes commented Mar 10, 2018 •

edited

Loading

timroes commented Sep 21, 2018

nreese commented Feb 28, 2019 •

edited

Loading

timroes commented May 2, 2019

elasticmachine commented May 3, 2019

rayafratkina commented Aug 14, 2019

timroes commented Aug 14, 2019

wylieconlon commented Nov 2, 2020

mfinkle commented Sep 15, 2021

timroes commented Sep 20, 2021

mfinkle commented Sep 21, 2021

timroes commented Sep 21, 2021

ppisljar commented Aug 9, 2022

Semantic data for fields #17087

Semantic data for fields #17087

Comments

timroes commented Mar 10, 2018 • edited Loading

timroes commented Sep 21, 2018

nreese commented Feb 28, 2019 • edited Loading

timroes commented May 2, 2019

elasticmachine commented May 3, 2019

rayafratkina commented Aug 14, 2019

timroes commented Aug 14, 2019

wylieconlon commented Nov 2, 2020

mfinkle commented Sep 15, 2021

timroes commented Sep 20, 2021

mfinkle commented Sep 21, 2021

timroes commented Sep 21, 2021

ppisljar commented Aug 9, 2022

timroes commented Mar 10, 2018 •

edited

Loading

nreese commented Feb 28, 2019 •

edited

Loading