Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] date histogram sent with wrong interval on non-time index-patterns #55165

Closed
markov00 opened this issue Jan 17, 2020 · 4 comments
Closed
Labels
bug Fixes for quality problems that affect the customer experience Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@markov00
Copy link
Member

markov00 commented Jan 17, 2020

Kibana version:
master

Describe the bug:
If the selected index-pattern doesn't have a configured time field, Lens still display the time filter and still use the time filter range (and its date histogram interval) to aggregate a date field.
This will to main issues:

  • ES errors on too many buckets generated: the ES query is issued without a time filter (because the index-pattern doesn't have a default time filed) and with a date_histogram interval relative to the time filter configured.
  • you will have a valid number of buckets, but the configuration issued to Elastic-Chart are wrong: the min and max domains refers to the min and max values of the time-filter, but the data itself is completely on a different domain (its full domain). This, in specific cases, can trigger an issue in elastic-charts that computes all the missing ticks (Generating too many buckets if the minInterval is bigger than the domain elastic-charts#517) generating a very big array of missing ticks that needs to be formatted and will freeze the browser for seconds. (this second issue will be fixed also in elastic-charts)

Steps to reproduce:

To reproduce the too many buckets issue follow these steps:

  1. add the kibana_sample_data_logs samples data
  2. create a new index-pattern called kibana_sample_data_l* without a time field specified (select the I don't want to use the Time Filter option from the dropdown)
  3. Create a lens visualization
  4. Select Last 15 minutes on the time filter
  5. Drag the @timestamp field into the x axis.

The query sent to elastic search will contain a date_histogram agg with the 30s interval (coming from the time filter) that is not ideal. An auto interval should be configured in that case, or we should apply the time filter to that query too

To reproduce the freeze issue follow these steps, it's a bit long, but I wasn't able to properly reproduce the same with a different data set

  1. Download the following CSV: https://opensky-network.org/datasets/metadata/aircraftDatabase.csv
  2. Download and extract the latest Logstash 7.5.1 https://www.elastic.co/downloads/logstash
  3. Open the Kibana Dev Tools and add the mappings for a the index we need to build:
PUT opensky_aircrafts
{
  "mappings": {
    "properties": {
      "built": {
        "type": "date",
        "format": "yyyy-MM-dd"
      },
      "registered": {
        "type": "date",
        "format": "yyyy-MM-dd"
      },
      "reguntil": {
        "type": "date",
        "format": "yyyy-MM-dd"
      },
      "firstflightdate": {
        "type": "date",
        "format": "yyyy-MM-dd"
      }
    }
  }
}
  1. Create an logstash pipeline to ingest the CSV:
input {
  file {
    path => "/ABSOLUTE/PATH/AND/FILENAME/OF/THE/CSV"
    mode => read
    start_position => "beginning"
    sincedb_path => "NULL"
  }
}
filter {
  csv {
    skip_header => true
    columns => ["icao24","registration","manufacturericao","manufacturername","model","typecode","serialnumber","linenumber","icaoaircrafttype","operator","operatorcallsign","operatoricao","operatoriata","owner","testreg","registered","reguntil","status","built","firstflightdate","seatconfiguration","engines","modes","adsb","acars","notes","categoryDescription"]
  }
  mutate {
    convert => {
      "acars" => "boolean"
      "adsb" => "boolean"
      "modes" => "boolean"
    }
    remove_field => ["message", "@timestamp", "path", "host", "@version"]
  }
  if [firstflightdate] == "" {
    mutate {
      remove_field => ["firstflightdate"]
    }
  }
  if [registered] == "" {
    mutate {
      remove_field => ["registered"]
    }
  }
  if [reguntil] == "" {
    mutate {
      remove_field => ["reguntil"]
    }
  }
  if [built] == "" {
    mutate {
      remove_field => ["built"]
    }
  }
}
output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    user => "elastic"
    password => "changeme"
    index => "opensky_aircrafts"
  }
}
  1. ingest the CSV with logstash (there can be some errors on parsing but you can safety ignore them)
bin/logstash -f <path to above config>
  1. Check that all the data is ingested GET opensky_aircrafts/_count should return a value near 460058

  2. Set the timepicker to Last 15 minutes

  3. Select the opensky_aircraftsindex, drag the built time field into the x field (lens will display the error of too many buckets, but you can go on)

  4. select count as the y field (the error still displayed)

  5. Click on the build field and select Customize min interval, select than 1 year. This will remove the error (because the data_histogram is now using a 1 year interval, but the min and max value of the chart are not always restricted to the last 15 minutes.

Expected behavior:

If displaying data from a index-pattern without a default time field, than:

  1. Avoid adding a custom min/max domain to elastic-charts and use auto_date_histogram to create a nice histogram without issuing a time filter or,
  2. if the user want's to display a date histogram, keep using the time filter range (in a way it's always a date field, also if it's not configured as the default in the index-pattern.

Issues can arrive when mixing multiple layers from different indices: in this case the 2. solution will is preferable, or if on the 1. solution you should in some way warn the user of what is going on behind the scene like: "oh since you are adding a time sensitive index-pattern, we are now limiting the NON_TIME_SENSITIVE_INDEX_P to the same time-range selected"

@markov00 markov00 added bug Fixes for quality problems that affect the customer experience Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Jan 17, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@wylieconlon
Copy link
Contributor

We have a custom lens_auto_date function that might be the cause here.

@wylieconlon
Copy link
Contributor

Okay I think I understand what's happening. Like you suggested, there are two things.

  1. Because the index does not contain a primary time field, there is no date filter being added, but the lens_auto_date function is generating an interval based on the assumption that there is one. This gives us a choice:
    • Filter to the date range, even for non-primary time fields
    • Stop generating intervals this way
  2. The chart is being rendered with a fixed range based on the time picker in the top right. This is generally what users expect when the date filter is being used, but might not be correct in this case. Based on the decision above, we can either:
    • If we add filters, then this behavior on rendering should not change
    • If we use the entire date range, then in this case the chart should render the whole range

My preference is to add filters automatically when using a non-primary date histogram.

@wylieconlon
Copy link
Contributor

Closed by #63874

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

3 participants