Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Use cgroup metrics for Beats and APM Server CPU/memory usage #79050

Closed
axw opened this issue Oct 1, 2020 · 7 comments · Fixed by #90873
Closed

[Stack Monitoring] Use cgroup metrics for Beats and APM Server CPU/memory usage #79050

axw opened this issue Oct 1, 2020 · 7 comments · Fixed by #90873

Comments

@axw
Copy link
Member

axw commented Oct 1, 2020

Describe the feature:

Update Stack Monitoring to use the cgroup metrics recently added to Beats: elastic/beats#21113. Once elastic/apm-server#4249 is merged, apm-server will too. Stack Monitoring should use these for calculating more accurate CPU and memory usage when running Beats and APM Server in containers.

Describe a specific use case for the feature:

From elastic/beats#14691:

The Stack Monitoring UI can take advantage of these metrics and display CPU usage more accurately, depending on whether the Beat is running in a container or not. It already does this for Elasticsearch (see https://www.elastic.co/guide/en/kibana/current/monitoring-settings-kb.html#monitoring-ui-cgroup-settings).

Example monitoring document:

(apm-server specific fields omitted for brevity.)

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 116,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : ".monitoring-beats-7-2020.10.01",
        "_id" : "MAQ-4nQBYHGHHbEFYAso",
        "_score" : null,
        "_source" : {
          "timestamp" : "2020-10-01T03:39:32.772Z",
          "type" : "beats_stats",
          "beats_stats" : {
            "beat" : {
              "type" : "apm-server",
              "version" : "8.0.0",
              "name" : "goat",
              "host" : "goat",
              "uuid" : "0f9447a1-5690-45b2-958c-13bd35b985ef"
            },
            "metrics" : {
              "libbeat" : {
                "pipeline" : {
                  "queue" : {
                    "acked" : 1
                  },
                  "clients" : 1,
                  "events" : {
                    "retry" : 1,
                    "active" : 0,
                    "total" : 1,
                    "filtered" : 0,
                    "published" : 1,
                    "failed" : 0,
                    "dropped" : 0
                  }
                },
                "config" : {
                  "scans" : 0,
                  "reloads" : 0,
                  "module" : {
                    "starts" : 0,
                    "stops" : 0,
                    "running" : 0
                  }
                },
                "output" : {
                  "type" : "elasticsearch",
                  "events" : {
                    "dropped" : 0,
                    "duplicates" : 0,
                    "active" : 0,
                    "toomany" : 0,
                    "batches" : 1,
                    "total" : 1,
                    "acked" : 1,
                    "failed" : 0
                  },
                  "write" : {
                    "bytes" : 5452,
                    "errors" : 0
                  },
                  "read" : {
                    "errors" : 1,
                    "bytes" : 8796
                  }
                }
              },
              "apm-server" : {...},
              "system" : {
                "load" : {
                  "norm" : {
                    "1" : 0.0583,
                    "5" : 0.065,
                    "15" : 0.08
                  },
                  "1" : 0.7,
                  "5" : 0.78,
                  "15" : 0.96
                },
                "cpu" : {
                  "cores" : 12
                }
              },
              "beat" : {
                "runtime" : {
                  "goroutines" : 38
                },
                "cgroup" : {
                  "cpu" : {
                    "id" : "user.slice",
                    "cfs" : {
                      "period" : {
                        "us" : 100000
                      },
                      "quota" : {
                        "us" : 0
                      }
                    },
                    "stats" : {
                      "periods" : 0,
                      "throttled" : {
                        "periods" : 0,
                        "ns" : 0
                      }
                    }
                  },
                  "cpuacct" : {
                    "id" : "user.slice",
                    "total" : {
                      "ns" : 334606355574041
                    }
                  },
                  "memory" : {
                    "id" : "user@1000.service",
                    "mem" : {
                      "limit" : {
                        "bytes" : 9223372036854771712
                      },
                      "usage" : {
                        "bytes" : 22278078464
                      }
                    }
                  }
                },
                "handles" : {
                  "open" : 10,
                  "limit" : {
                    "soft" : 1024,
                    "hard" : 1048576
                  }
                },
                "info" : {
                  "uptime" : {
                    "ms" : 920028
                  },
                  "ephemeral_id" : "66577ddf-6989-4aa4-97bb-d3a2f6ef5ee7"
                },
                "memstats" : {
                  "rss" : 60170240,
                  "memory_total" : 84473352,
                  "memory_alloc" : 6964296,
                  "gc_next" : 11049216
                },
                "cpu" : {
                  "system" : {
                    "ticks" : 250,
                    "time" : {
                      "ms" : 251
                    }
                  },
                  "total" : {
                    "value" : 790,
                    "ticks" : 790,
                    "time" : {
                      "ms" : 796
                    }
                  },
                  "user" : {
                    "time" : {
                      "ms" : 545
                    },
                    "ticks" : 540
                  }
                }
              }
            },
            "timestamp" : "2020-10-01T03:39:32.772Z"
          },
          "interval_ms" : 10000,
          "cluster_uuid" : "SjK7bjjJSYe_VPDMlWxGqA"
        },
        "sort" : [
          1601523572772
        ]
      }
    ]
  }
}
@chrisronline
Copy link
Contributor

We can work on this for the next minor. However, I'm not seeing this data showing up in 7.10 cloud instances. I'm assuming it should be working there. To be clear, the fields are present but the values are all 0. We'll also need to add these fields to the mappings in Elasticsearch (An example of adding new mappings: elastic/elasticsearch#34392)

@axw
Copy link
Member Author

axw commented Dec 2, 2020

Opened elastic/beats#22844 regarding the zero values issue.

axw added a commit to elastic/elasticsearch that referenced this issue Dec 9, 2020
Add field mappings for the cgroups metrics added to
beats monitoring in elastic/beats#21113,
and required by elastic/kibana#79050.
axw added a commit to elastic/elasticsearch that referenced this issue Dec 9, 2020
Add field mappings for the cgroups metrics added to
beats monitoring in elastic/beats#21113,
and required by elastic/kibana#79050.
@axw
Copy link
Member Author

axw commented Dec 9, 2020

Mappings are updated (thanks for the review @chrisronline, I hope I wasn't too presumptuous in asking) and backported to 7.x. Just waiting on a final review of the beats changes before we can pull that into apm-server.

@axw
Copy link
Member Author

axw commented Jan 21, 2021

@chrisronline the Kibana changes did not make it for 7.11, due to the issues you found - is that right? The ES mappings and apm-server fix were both merged in for 7.11, so should we relabel this for 7.12 now?

@chrisronline
Copy link
Contributor

Hi @axw,

It doesn't look it made 7.11, but we will prioritize it for 7.12

@ravikesarwani
Copy link
Contributor

cc: @elastic/stack-monitoring Are the changes for this checked in the SM UI for 7.12 yet?

@chrisronline
Copy link
Contributor

Not yet, but it will be done for 7.12

2lambda123 pushed a commit to 2lambda123/elastic-elasticsearch that referenced this issue May 3, 2024
Add field mappings for the cgroups metrics added to
beats monitoring in elastic/beats#21113,
and required by elastic/kibana#79050.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants