[Ingest Pipeline] Date processor performance issues - Discussion #73918

P1llus · 2021-06-08T17:23:03Z

When working with beats and elastic-agent integrations, there are occasions in which the ingest pipelines might be slower than intended, looking at benchmarking stats it has pointed us towards the date processor being the culprit.

With integrations that potentially have tens or hundreds of processors, a single date processor will still take up almost more time than almost the rest combined, no matter the complexity.

Just to give a small example to reproduce it on a brand new deployed ESS cluster, with 3 nodes, 8GB of ram and version 7.13.1:

First I create a new ingest pipeline, that has most of the available processors today:

PUT _ingest/pipeline/benchmarking
{
  "description": "Benchmarking Pipeline",
  "processors": [
    {
      "date": {
        "description": "Converts timestamp to @timestamp with single format",
        "field": "timestamp",
        "target_field": "@timestamp",
        "formats": [
          "dd/MM/yyyy HH:mm:ss"
        ],
        "timezone": "Europe/Amsterdam"
      }
    },
    {
      "append": {
        "field": "append",
        "value": [
          "test1",
          "test2",
          "test3"
        ]
      }
    },
    {
      "convert": {
        "field": "integer",
        "type": "integer"
      }
    },
    {
      "dissect": {
        "field": "message",
        "pattern": "%{dissect}"
      }
    },
    {
      "fingerprint": {
        "fields": [
          "message"
        ],
        "target_field": "fingerprint"
      }
    },
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{GREEDYDATA:grok}"
        ]
      }
    },
    {
      "gsub": {
        "field": "gsub",
        "pattern": " ",
        "replacement": ""
      }
    },
    {
      "join": {
        "field": "join",
        "separator": "-"
      }
    },
    {
      "json": {
        "field": "jsontest",
        "target_field": "json"
      }
    },
    {
      "kv": {
        "field": "kv",
        "field_split": " ",
        "value_split": "="
      }
    },
    {
      "lowercase": {
        "field": "lowercase"
      }
    },
    {
      "remove": {
        "field": "remove"
      }
    },
    {
      "rename": {
        "field": "rename",
        "target_field": "renames"
      }
    },
    {
      "script": {
        "description": "Extract 'tags' from 'env' field",
        "lang": "painless",
        "source": """
            String[] test = ctx.testscript.splitOnToken('@');
            ctx.testscriptoutput = test;
          """
      }
    },
    {
      "set": {
        "description": "Set static value",
        "field": "set",
        "value": "setvalue"
      }
    },
    {
      "set": {
        "description": "Set copy from field",
        "field": "setcopy",
        "copy_from": "set"
      }
    },
    {
      "split": {
        "field": "split",
        "separator": ","
      }
    },
    {
      "trim": {
        "field": "trim"
      }
    },
    {
      "uppercase": {
        "field": "uppercase"
      }
    },
    {
      "urldecode": {
        "field": "urldecode"
      }
    },
    {
      "uri_parts": {
        "field": "testurl",
        "target_field": "url",
        "keep_original": true,
        "remove_if_successful": true
      }
    },
    {
      "user_agent" : {
        "field" : "agent"
      }
    }
  ]
}

After this, I start ingesting some test documents, I tested with 80 documents only, because I already started to see the results I wanted to prove:

POST my-index-000001/_doc/?pipeline=benchmarking
{
  "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
  "timestamp": "24/02/1989 10:20:20",
  "append": ["value"],
  "integer": "2",
  "jsontest": """{"name": "test"}""",
  "message": "sometestmessage",
  "gsub": "test message",
  "join": ["two", "messages"],
  "kv": "name=john age=20",
  "lowercase": "UPpPerCCase",
  "uppercase": "lowercase",
  "remove": "removeme",
  "rename": "renameme",
  "testscript": "test@elastic.co",
  "split": "value1,value2",
  "trim": " trimme ",
  "urldecode": "https://www.elastic.co/test",
  "testurl": "https://www.elastic.co/test"
}

After ingesting about 80 documents, this is the output of the node stats API for ingest metrics:

GET /_nodes/stats?metric=ingest&filter_path=nodes.*.ingest.pipelines
         "benchmarking" : {
            "count" : 80,
            "time_in_millis" : 76,
            "current" : 0,
            "failed" : 0,
            "processors" : [
              {
                "date" : {
                  "type" : "date",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 26,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "append" : {
                  "type" : "append",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "convert" : {
                  "type" : "convert",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "dissect" : {
                  "type" : "dissect",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 2,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "fingerprint" : {
                  "type" : "fingerprint",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 2,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "grok" : {
                  "type" : "grok",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 5,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "gsub" : {
                  "type" : "gsub",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "join" : {
                  "type" : "join",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "json" : {
                  "type" : "json",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 2,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "kv" : {
                  "type" : "kv",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "lowercase" : {
                  "type" : "lowercase",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "remove" : {
                  "type" : "remove",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "rename" : {
                  "type" : "rename",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "script" : {
                  "type" : "script",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 7,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "set" : {
                  "type" : "set",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "set" : {
                  "type" : "set",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "split" : {
                  "type" : "split",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "trim" : {
                  "type" : "trim",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "uppercase" : {
                  "type" : "uppercase",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "urldecode" : {
                  "type" : "urldecode",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "uri_parts" : {
                  "type" : "uri_parts",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 2,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "user_agent" : {
                  "type" : "user_agent",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 11,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              }
            ]
          },

As you can see, the amount of difference already is quite significant, while most processors spent less than 1 millisecond going through 80 documents, the date processor, with a single date format is already up in 26ms, and this value will just keep on getting further and further, and would only increase with ingest rates being much higher than in my test scenario.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-06-08T17:41:44Z

Pinging @elastic/es-core-features (Team:Core/Features)

danhermann · 2021-06-09T20:43:51Z

I spent some time looking into this and saw that there was a significant increase in the execution time (~10x) of the date processor between the 7.4.2 and 7.5.0 releases. There were no changes to the date processor in that timeframe, but there were changes to several of the classes on which it depends and the changes to JavaDateFormatter in #46654 looked to be the most substantial. I attempted to short-circuit those changes by specifying iso8601 as the date format for the date processor and re-running the test above. That resulted in a significant improvement in running time though it was still not quite as fast as 7.4.2. Before I dig further into this, @pgomulka, do you have any thoughts on how date formatters with a dd/MM/yyyy HH:mm:ss pattern from the test case above might have decreased in performance?

pgomulka · 2021-06-14T08:13:17Z

I think the difference between a custom dd/MM/yyyy HH:mm:ss and iso8601 in terms of the performance could be that iso8601 is preinitialised in elasticsearch. It does not need to parse the pattern and create a date formatter object.
The pattern parsing and date format creation is performed on every ingestion. Not entirely sure if this is possible, but maybe we could think of preinitialising this in DateProcessor class somehow?

I am not sure why #46654 could affect the performance. Maybe the creation of dateformatter has become more expensive?

We used to have a mini optimisation when a format is a single pattern (not using ||) but it was removed in #48703 - just adding it back, did not change anything for me.

pgomulka · 2021-06-15T17:05:02Z

I have tried running some benchmarks to get flame graphs and I am not sure if there really is a performance change. The flame graph from a 6min profile is not showing that more time is spent in DateProcessor

flame graph.zip

There was a change in how the time is calculated for ingest processor https://github.com/elastic/elasticsearch/pull/46241/files
Would it be possible that this could be a reason?
@martijnvg Do you think that for the manual test from the first comment #46241 could increase the time spent in DateProcessor? Would it somehow not affect other processors?

jonathanlocke · 2021-06-21T22:02:00Z

Creating date formatter objects is indeed expensive if you need to create millions of them. Since these classes are also not thread safe, consider using a static ThreadLocal to keep an instance (or map of instances) around for each thread.

jonathanl-telenav · 2021-06-21T22:35:31Z

I made a simple ThreadLocalMap for you here:

ThreadLocalMap.java

danhermann · 2021-06-24T13:29:23Z

I have tried running some benchmarks to get flame graphs and I am not sure if there really is a performance change. The flame graph from a 6min profile is not showing that more time is spent in DateProcessor

flame graph.zip

There was a change in how the time is calculated for ingest processor https://github.com/elastic/elasticsearch/pull/46241/files
Would it be possible that this could be a reason?
@martijnvg Do you think that for the manual test from the first comment #46241 could increase the time spent in DateProcessor? Would it somehow not affect other processors?

Thanks for looking into this further, @pgomulka. The flame graph results are helpful. While there were significant changes in #46241, the date processor is not async so I would expect any timing changes to be equally relevant to the other processors in the example pipeline above. I'm going to talk to @martijnvg and see if we can narrow down some of the possibilities here.

pgomulka · 2021-06-24T14:52:47Z

I feel that we should test a 7.x with #46652 but without #46241
I can try building and testing this next week
unless you have any other ideas?

danhermann · 2021-06-24T19:27:11Z

I feel that we should test a 7.x with #46652 but without #46241
I can try building and testing this next week
unless you have any other ideas?

Did you mean with #46654 instead of #46652? If so, that could be helpful to isolate the problem.

elasticsearchmachine · 2023-03-01T20:15:03Z

Pinging @elastic/es-data-management (Team:Data Management)

joegallo · 2023-03-01T20:15:19Z

#92880 will make a difference here, too.

mattc58 · 2023-08-16T17:08:45Z

Version 8.7.0 contained several related performance changes for ingest pipelines. We're going to close this for now but will reopen if necessary.

P1llus added >bug needs:triage Requires assignment of a team area label labels Jun 8, 2021

danhermann added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed needs:triage Requires assignment of a team area label labels Jun 8, 2021

elasticmachine added the Team:Data Management Meta label for data/management team label Jun 8, 2021

danhermann self-assigned this Jun 8, 2021

leehinman mentioned this issue Jun 8, 2021

Low throughput rates for Cisco ASA module's Ingest Node pipeline elastic/beats#25768

Open

joegallo unassigned danhermann Mar 1, 2023

joegallo mentioned this issue Mar 1, 2023

Cache the creation of parsers within DateProcessor #92880

Merged

mattc58 closed this as completed Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ingest Pipeline] Date processor performance issues - Discussion #73918

[Ingest Pipeline] Date processor performance issues - Discussion #73918

P1llus commented Jun 8, 2021 •

edited

Loading

elasticmachine commented Jun 8, 2021

danhermann commented Jun 9, 2021

pgomulka commented Jun 14, 2021

pgomulka commented Jun 15, 2021 •

edited

Loading

jonathanlocke commented Jun 21, 2021

jonathanl-telenav commented Jun 21, 2021

danhermann commented Jun 24, 2021

pgomulka commented Jun 24, 2021

danhermann commented Jun 24, 2021

elasticsearchmachine commented Mar 1, 2023

joegallo commented Mar 1, 2023

mattc58 commented Aug 16, 2023

[Ingest Pipeline] Date processor performance issues - Discussion #73918

[Ingest Pipeline] Date processor performance issues - Discussion #73918

Comments

P1llus commented Jun 8, 2021 • edited Loading

elasticmachine commented Jun 8, 2021

danhermann commented Jun 9, 2021

pgomulka commented Jun 14, 2021

pgomulka commented Jun 15, 2021 • edited Loading

jonathanlocke commented Jun 21, 2021

jonathanl-telenav commented Jun 21, 2021

danhermann commented Jun 24, 2021

pgomulka commented Jun 24, 2021

danhermann commented Jun 24, 2021

elasticsearchmachine commented Mar 1, 2023

joegallo commented Mar 1, 2023

mattc58 commented Aug 16, 2023

P1llus commented Jun 8, 2021 •

edited

Loading

pgomulka commented Jun 15, 2021 •

edited

Loading