Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic data for fields #17087

Closed
timroes opened this issue Mar 10, 2018 · 12 comments
Closed

Semantic data for fields #17087

timroes opened this issue Mar 10, 2018 · 12 comments
Labels
enhancement New value added to drive a business result Feature:Data Views Data Views code and UI - index patterns before 8.0 Feature:Kibana Management Feature label for Data Views, Advanced Setting, Saved Object management pages impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort Meta

Comments

@timroes
Copy link
Contributor

timroes commented Mar 10, 2018

I think adding more semantic information about fields in your index, could provide us quite some benefits. Currently our only semantic information we are having about a field, is it's type. But a user might know way more about the data inside that field, than just it's type. Some information that a user knowing the data could provide:

  • Ranges of the data - e.g. a numeric field could contain output from an industrial sensor and would always be between 0.0 and 1.0.
  • Step size of numeric values - e.g. this value will always be a multiple of 5
  • Enum information - a field of type string, could actually just contain a few distinct values (like error, warning, info) providing that semantic information improves over doing terms aggregation, since not all possible values might yet be existing in the data.
  • A semantic information/type of the value - e.g. that field contains an ISO2 country code or this field actually contains IATA airport codes
  • Relation between fields are one type of semantical information (see [Lens] Introduce the concept of "related fields" to guide end-users #73152)
  • ...

Some examples of where these semantic information could be used and come in handy:

  • If we know which fields contain an ISO2 code, highlight or limit to these fields in Region maps.
  • Use ranges or step sizes of fields to limit an input control on that field.
  • Enable limiting the axis of a chart to the actual possible data range.
  • Better auto completion for KQL or filter generation, out of enum information or range information, maybe we could even generate way more advanced filter UI like "this field contains an ISO2 code, so select from all countries, but we also show country names, for easier selection"
  • ...

I see two possible ways of implementing that:

Store the information in the _meta field of the mapping

The advantage here would be, that this information is directly sitting "besides" the data. That way the data, thus the person responsible for defining the data mapping anyway would also have full control of semantic information. It also would be available to all tools not only Kibana.

The disadvantage is, that this would lack a good UI to configure it, make it harder to configure it later on and also a lot of people are working with dynamic field mappings, which might make this more complex.

Store the information in the index pattern

That way we could directly provide an UI when configuring an index pattern to select the semantic information. Would make it way easier to configure and change that configuration later on.

The hugest disadvantage here would be, that this information would purely be available for that single Kibana instance and not shared between different tools.

/cc @trevan @chrisronline

@timroes
Copy link
Contributor Author

timroes commented Sep 21, 2018

A possible use-case is described in #22486

@timroes timroes added Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Kibana Management Feature label for Data Views, Advanced Setting, Saved Object management pages and removed :Management DO NOT USE labels Nov 27, 2018
@nreese
Copy link
Contributor

nreese commented Feb 28, 2019

Another use-case for index-pattern semantic data is identifying a default geospatial field for an index-pattern.

It would be helpful to apply a geospatial filter to dashboard and have the filter applied to the default geospatial field for each embeddable's index-pattern(s). Without a default geospatial field, users have to ensure all indices use the same field name for the geospatial field so the query can be applied across disparate indices.

@timroes
Copy link
Contributor Author

timroes commented May 2, 2019

cc @mattkime @stacey-gammon @lukeelmers @ppisljar Basically that topic comes up every other week as a solution to some of the problems we're having that we can't automatically calculate (like whether a numeric field is actually a continuous number or identifiers (like user ids)). Since Matt you're anyway rethinking index patterns at the moment I just wanted to raise awareness for that issue again.

@lukeelmers lukeelmers added :AppArch Feature:Data Views Data Views code and UI - index patterns before 8.0 labels May 3, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch

@rayafratkina
Copy link
Contributor

How would this relate to ECS? Would that simply be one source of this data?

@timroes
Copy link
Contributor Author

timroes commented Aug 14, 2019

Imho we could build a "shortcut" for ECS data, so that if we detect an ECS index (however we would do that exactly), we can set all the semantic information in the index pattern, according to our knowledge about ECS. But ECS is way more generic than what this suggestion here would allow, so a user could still finetune their index pattern potentially to enrich it, even if it's ECS, with more semantic information to get more benefit out of their data.

@timroes timroes removed the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Oct 5, 2020
@wylieconlon
Copy link
Contributor

We now have an agreement to have standard metadata for numeric fields in the stack, which is a good place to start. It's not yet part of ECS but it will soon be part of many solutions.

@mfinkle
Copy link

mfinkle commented Sep 15, 2021

This feels like there is overlap with the new Aggregated View in Discover, powered by the ML system.

@timroes
Copy link
Contributor Author

timroes commented Sep 20, 2021

This feels like there is overlap with the new Aggregated View in Discover, powered by the ML system.

@mfinkle could you please elaborate a bit on where you see the overlap? I think this issue was mostly about having the ability to define semantic data which we can't automatically determine to fields to improve the UI in several places. I'm personally not seeing a huge overlap with the data visualizer from ML?

@mfinkle
Copy link

mfinkle commented Sep 21, 2021

I was just assuming that the underlying code in the Data Visualizer had to run some basic descriptive statistics on the fields, some of which might also be able to used or extended to find ranges, step sizes, and maybe potential enum-able fields.

@timroes
Copy link
Contributor Author

timroes commented Sep 21, 2021

There are some automatic pulled information about e.g. the min/max/median value. What I described with "semantic data" here is actually exactly the things that are not able to be determined automatically. I'll give a couple of examples:

  • We could have a sensor logging values always between 0 and 255. This is semantic data I find useful to have, and we could e.g. adjust our filter editors simply by a range slider than. But that doesn't mean that the minimum value actually is 0 and the maximum 255. The sensor could so far have logged data only between 42.2345 and 129.89 (which is then the only thing we can automatically determine). I still think we have a lot of benefits knowing that the actual value range is 0 to 255, but there's no way to determine that automatically, it's simply knowledge only the "user" (i.e. creator of the data) has.
  • A field might have values of 2, 4, 6. We might determine a step size of 2, while the actual step size is 1, we simply never had the values in between logged.

I believe when we can automatically determine information about a fields, we should do so (and already partially do). Besides that there are still a log ot use-cases left, where we can't determine the semantic of the field automatically, but would still benefit from knowing it and could improve our UX for those fields.

@exalate-issue-sync exalate-issue-sync bot added loe:medium Medium Level of Effort and removed loe:small Small Level of Effort labels Oct 7, 2021
@exalate-issue-sync exalate-issue-sync bot added loe:small Small Level of Effort and removed loe:medium Medium Level of Effort labels Nov 19, 2021
@ppisljar
Copy link
Member

ppisljar commented Aug 9, 2022

Thank you for contributing to this issue, however, we are closing this issue due to inactivity as part of a backlog grooming effort. If you believe this feature/bug should still be considered, please reopen with a comment.

@ppisljar ppisljar closed this as not planned Won't fix, can't repro, duplicate, stale Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Data Views Data Views code and UI - index patterns before 8.0 Feature:Kibana Management Feature label for Data Views, Advanced Setting, Saved Object management pages impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort Meta
Projects
None yet
Development

No branches or pull requests

9 participants