Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Pipeline] Date processor performance issues - Discussion #73918

Closed
P1llus opened this issue Jun 8, 2021 · 12 comments
Closed

[Ingest Pipeline] Date processor performance issues - Discussion #73918

P1llus opened this issue Jun 8, 2021 · 12 comments
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team

Comments

@P1llus
Copy link
Member

P1llus commented Jun 8, 2021

When working with beats and elastic-agent integrations, there are occasions in which the ingest pipelines might be slower than intended, looking at benchmarking stats it has pointed us towards the date processor being the culprit.

With integrations that potentially have tens or hundreds of processors, a single date processor will still take up almost more time than almost the rest combined, no matter the complexity.

Just to give a small example to reproduce it on a brand new deployed ESS cluster, with 3 nodes, 8GB of ram and version 7.13.1:

First I create a new ingest pipeline, that has most of the available processors today:

PUT _ingest/pipeline/benchmarking
{
  "description": "Benchmarking Pipeline",
  "processors": [
    {
      "date": {
        "description": "Converts timestamp to @timestamp with single format",
        "field": "timestamp",
        "target_field": "@timestamp",
        "formats": [
          "dd/MM/yyyy HH:mm:ss"
        ],
        "timezone": "Europe/Amsterdam"
      }
    },
    {
      "append": {
        "field": "append",
        "value": [
          "test1",
          "test2",
          "test3"
        ]
      }
    },
    {
      "convert": {
        "field": "integer",
        "type": "integer"
      }
    },
    {
      "dissect": {
        "field": "message",
        "pattern": "%{dissect}"
      }
    },
    {
      "fingerprint": {
        "fields": [
          "message"
        ],
        "target_field": "fingerprint"
      }
    },
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{GREEDYDATA:grok}"
        ]
      }
    },
    {
      "gsub": {
        "field": "gsub",
        "pattern": " ",
        "replacement": ""
      }
    },
    {
      "join": {
        "field": "join",
        "separator": "-"
      }
    },
    {
      "json": {
        "field": "jsontest",
        "target_field": "json"
      }
    },
    {
      "kv": {
        "field": "kv",
        "field_split": " ",
        "value_split": "="
      }
    },
    {
      "lowercase": {
        "field": "lowercase"
      }
    },
    {
      "remove": {
        "field": "remove"
      }
    },
    {
      "rename": {
        "field": "rename",
        "target_field": "renames"
      }
    },
    {
      "script": {
        "description": "Extract 'tags' from 'env' field",
        "lang": "painless",
        "source": """
            String[] test = ctx.testscript.splitOnToken('@');
            ctx.testscriptoutput = test;
          """
      }
    },
    {
      "set": {
        "description": "Set static value",
        "field": "set",
        "value": "setvalue"
      }
    },
    {
      "set": {
        "description": "Set copy from field",
        "field": "setcopy",
        "copy_from": "set"
      }
    },
    {
      "split": {
        "field": "split",
        "separator": ","
      }
    },
    {
      "trim": {
        "field": "trim"
      }
    },
    {
      "uppercase": {
        "field": "uppercase"
      }
    },
    {
      "urldecode": {
        "field": "urldecode"
      }
    },
    {
      "uri_parts": {
        "field": "testurl",
        "target_field": "url",
        "keep_original": true,
        "remove_if_successful": true
      }
    },
    {
      "user_agent" : {
        "field" : "agent"
      }
    }
  ]
}

After this, I start ingesting some test documents, I tested with 80 documents only, because I already started to see the results I wanted to prove:

POST my-index-000001/_doc/?pipeline=benchmarking
{
  "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
  "timestamp": "24/02/1989 10:20:20",
  "append": ["value"],
  "integer": "2",
  "jsontest": """{"name": "test"}""",
  "message": "sometestmessage",
  "gsub": "test message",
  "join": ["two", "messages"],
  "kv": "name=john age=20",
  "lowercase": "UPpPerCCase",
  "uppercase": "lowercase",
  "remove": "removeme",
  "rename": "renameme",
  "testscript": "test@elastic.co",
  "split": "value1,value2",
  "trim": " trimme ",
  "urldecode": "https://www.elastic.co/test",
  "testurl": "https://www.elastic.co/test"
}

After ingesting about 80 documents, this is the output of the node stats API for ingest metrics:

GET /_nodes/stats?metric=ingest&filter_path=nodes.*.ingest.pipelines
         "benchmarking" : {
            "count" : 80,
            "time_in_millis" : 76,
            "current" : 0,
            "failed" : 0,
            "processors" : [
              {
                "date" : {
                  "type" : "date",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 26,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "append" : {
                  "type" : "append",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "convert" : {
                  "type" : "convert",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "dissect" : {
                  "type" : "dissect",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 2,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "fingerprint" : {
                  "type" : "fingerprint",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 2,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "grok" : {
                  "type" : "grok",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 5,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "gsub" : {
                  "type" : "gsub",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "join" : {
                  "type" : "join",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "json" : {
                  "type" : "json",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 2,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "kv" : {
                  "type" : "kv",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "lowercase" : {
                  "type" : "lowercase",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "remove" : {
                  "type" : "remove",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "rename" : {
                  "type" : "rename",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 1,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "script" : {
                  "type" : "script",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 7,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "set" : {
                  "type" : "set",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "set" : {
                  "type" : "set",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "split" : {
                  "type" : "split",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "trim" : {
                  "type" : "trim",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "uppercase" : {
                  "type" : "uppercase",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "urldecode" : {
                  "type" : "urldecode",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 0,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "uri_parts" : {
                  "type" : "uri_parts",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 2,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },
              {
                "user_agent" : {
                  "type" : "user_agent",
                  "stats" : {
                    "count" : 80,
                    "time_in_millis" : 11,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              }
            ]
          },

As you can see, the amount of difference already is quite significant, while most processors spent less than 1 millisecond going through 80 documents, the date processor, with a single date format is already up in 26ms, and this value will just keep on getting further and further, and would only increase with ingest rates being much higher than in my test scenario.

@P1llus P1llus added >bug needs:triage Requires assignment of a team area label labels Jun 8, 2021
@danhermann danhermann added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed needs:triage Requires assignment of a team area label labels Jun 8, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jun 8, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@danhermann
Copy link
Contributor

I spent some time looking into this and saw that there was a significant increase in the execution time (~10x) of the date processor between the 7.4.2 and 7.5.0 releases. There were no changes to the date processor in that timeframe, but there were changes to several of the classes on which it depends and the changes to JavaDateFormatter in #46654 looked to be the most substantial. I attempted to short-circuit those changes by specifying iso8601 as the date format for the date processor and re-running the test above. That resulted in a significant improvement in running time though it was still not quite as fast as 7.4.2. Before I dig further into this, @pgomulka, do you have any thoughts on how date formatters with a dd/MM/yyyy HH:mm:ss pattern from the test case above might have decreased in performance?

@pgomulka
Copy link
Contributor

I think the difference between a custom dd/MM/yyyy HH:mm:ss and iso8601 in terms of the performance could be that iso8601 is preinitialised in elasticsearch. It does not need to parse the pattern and create a date formatter object.
The pattern parsing and date format creation is performed on every ingestion. Not entirely sure if this is possible, but maybe we could think of preinitialising this in DateProcessor class somehow?

I am not sure why #46654 could affect the performance. Maybe the creation of dateformatter has become more expensive?

We used to have a mini optimisation when a format is a single pattern (not using ||) but it was removed in #48703 - just adding it back, did not change anything for me.

@pgomulka
Copy link
Contributor

pgomulka commented Jun 15, 2021

I have tried running some benchmarks to get flame graphs and I am not sure if there really is a performance change. The flame graph from a 6min profile is not showing that more time is spent in DateProcessor

flame graph.zip

There was a change in how the time is calculated for ingest processor https://github.com/elastic/elasticsearch/pull/46241/files
Would it be possible that this could be a reason?
@martijnvg Do you think that for the manual test from the first comment #46241 could increase the time spent in DateProcessor? Would it somehow not affect other processors?

@jonathanlocke
Copy link

Creating date formatter objects is indeed expensive if you need to create millions of them. Since these classes are also not thread safe, consider using a static ThreadLocal to keep an instance (or map of instances) around for each thread.

@jonathanl-telenav
Copy link

I made a simple ThreadLocalMap for you here:

ThreadLocalMap.java

@danhermann
Copy link
Contributor

I have tried running some benchmarks to get flame graphs and I am not sure if there really is a performance change. The flame graph from a 6min profile is not showing that more time is spent in DateProcessor

flame graph.zip

There was a change in how the time is calculated for ingest processor https://github.com/elastic/elasticsearch/pull/46241/files
Would it be possible that this could be a reason?
@martijnvg Do you think that for the manual test from the first comment #46241 could increase the time spent in DateProcessor? Would it somehow not affect other processors?

Thanks for looking into this further, @pgomulka. The flame graph results are helpful. While there were significant changes in #46241, the date processor is not async so I would expect any timing changes to be equally relevant to the other processors in the example pipeline above. I'm going to talk to @martijnvg and see if we can narrow down some of the possibilities here.

@pgomulka
Copy link
Contributor

I feel that we should test a 7.x with #46652 but without #46241
I can try building and testing this next week
unless you have any other ideas?

@danhermann
Copy link
Contributor

I feel that we should test a 7.x with #46652 but without #46241
I can try building and testing this next week
unless you have any other ideas?

Did you mean with #46654 instead of #46652? If so, that could be helpful to isolate the problem.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@joegallo
Copy link
Contributor

joegallo commented Mar 1, 2023

#92880 will make a difference here, too.

@mattc58
Copy link
Contributor

mattc58 commented Aug 16, 2023

Version 8.7.0 contained several related performance changes for ingest pipelines. We're going to close this for now but will reopen if necessary.

@mattc58 mattc58 closed this as completed Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

9 participants