Provide pre-aggregated metrics in /stats API #119560

matschaffer · 2021-11-24T03:38:31Z

Proposal

#104124 proposes removal of the process field. I'd like to propose another option: That we keep the process field around but fill it with meaningful aggregates of the processes: [] data.

I see two benefits from this:

Any current computations on process (for example https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/lib/kibana/get_kibanas_for_clusters.ts#L129-L138) can continue functioning as-is. So mixing pre/post node-clustering data should be no problem.
It provides simple/fast ways to query for overview data. This is similar in practice to things like docker.cpu.total.norm.pct where we pre-compute the total CPU percentage based on the current number of cores at ingest time. We could potentially also do this at query time, but it makes any such query quite complex.

Proposed aggregations

As a starting point, here are aggregates that I think might be useful. Please feel free to edit as the story develops:

memory.heap.total_in_bytes: sum of all processes
memory.heap.used_in_bytes: sum of all processes
memory.heap.size_limit: sum of all processes
memory.resident_set_size_in_bytes: sum of all processes
event_loop_delay: max of all processes
event_loop_delay_histogram: same stats, but aggregated over all proceses

The example doc I have seems to be missing cpu, which I'm sure we'll want in there as well.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-11-24T03:38:33Z

Pinging @elastic/kibana-core (Team:Core)

lukeelmers · 2021-12-01T19:16:52Z

Copying over the proposed aggregations from the Node clustering RFC:

{
  // ...
  "process": {
    "memory": {
      "heap": {
        "total_bytes": 533581824, // sum of coordinator + workers
        "used_bytes": 296297424, // sum of coordinator + workers
        "size_limit": 4345298944 // sum of coordinator + workers
      },
      "resident_set_size_bytes": 563625984 // sum of coordinator + workers
    },
    "pid": 52646, // pid of the coordinator
    "event_loop_delay": 0.22967800498008728, // max of coordinator + workers
    "uptime_ms": 1706021.930404 // uptime of the coordinator
  },
  // ...
}

It seems this is pretty much in-line with @matschaffer's recommendations, just including a few missing fields that we'll need to retain in order to prevent this from being a breaking change. The snippet above is also missing event_loop_delay_histogram which was added after the original RFC.

This change will need to happen whenever introduce the ability for Kibana to run multiple processes, see #68626

pgayvallet · 2024-07-08T07:36:26Z

#68626 has been closed, so I will close this one too. If we ever get back to the node clustering discussions, this should get back naturally in the discussions

matschaffer added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Nov 24, 2021

lukeelmers mentioned this issue Dec 1, 2021

Use Nodejs clustering to take advantage of multi-core hardware #68626

Closed

21 tasks

pgayvallet closed this as not planned Won't fix, can't repro, duplicate, stale Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide pre-aggregated metrics in /stats API #119560

Provide pre-aggregated metrics in /stats API #119560

matschaffer commented Nov 24, 2021

elasticmachine commented Nov 24, 2021

lukeelmers commented Dec 1, 2021

pgayvallet commented Jul 8, 2024

Provide pre-aggregated metrics in /stats API #119560

Provide pre-aggregated metrics in /stats API #119560

Comments

matschaffer commented Nov 24, 2021

Proposal

Proposed aggregations

elasticmachine commented Nov 24, 2021

lukeelmers commented Dec 1, 2021

pgayvallet commented Jul 8, 2024