Skip to content

Commit

Permalink
elastic-agent diagnostics pprof (#28798) (#29429)
Browse files Browse the repository at this point in the history
* Allow -httpprof to bind to sockets/pipes

* Enable pprof debug endpoint on socket for agent and beats

Force the elastic-agent and all beats that it starts to run the
http/pprof listener on a local socket.

* Add new Pprof command to control.proto

* Add pprof option to diagnostics collect

* Fix linting issues

* Add diagonstics pprof command allow pprof to collect from agent

* Revert debug socket changes

* Cleanup timeout handling

Change pprof timeouts from 2*pprofDur to 30s+pprofDur. Remove timeouts
from the socket requester client as cancellations for long running
requests will be handled by the passed ctx.

* Fix linting issue add timeout flag

Fix linting issues with new command. Add a timeout flag for when pprof
info is gathered. Flag will let users specify the command timeout value.
This value whould be greater then the pprof-duration as it needs to
gather and process pprof data.

* Add more command help text.

* Add CHANGELOG

* move spec collection for routes to fn

* add monitoringCfg reference to control server

* elastic-agent server only processes pprof requests when enabled

* Fix error message fix commands only on elastic-agent

* Add pprof fleet.yml, fix nil reference

* Change pprof setting name to monitoring.pprof.enabled

Chagne the setting in elastic agent from agent.monioring.pprof to
agent.monitoring.pprof.enabled so that policy updates (such as the one
that occurs when the agent is starting in fleet mode) do not use the
default false value if the user has injected the ssetting into fleet.yml

(cherry picked from commit f5e0ec4)

Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>
  • Loading branch information
mergify[bot] and michel-laterman authored Dec 15, 2021
1 parent 381a505 commit 2b4e354
Show file tree
Hide file tree
Showing 16 changed files with 1,031 additions and 152 deletions.
2 changes: 2 additions & 0 deletions x-pack/elastic-agent/CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,8 @@
- Add diagnostics command to gather beat metadata. {pull}28265[28265]
- Add diagnostics collect command to gather beat metadata, config, policy, and logs and bundle it into an archive. {pull}28461[28461]
- Add `KIBANA_FLEET_SERVICE_TOKEN` to Elastic Agent container. {pull}28096[28096]
- Enable pprof endpoints for beats processes. Allow pprof endpoints for elastic-agent if enabled. {pull}28983[28983]
- Add `--pprof` flag to `elastic-agent diagnostics` and an `elastic-agent pprof` command to allow operators to gather pprof data from the agent and beats running under it. {pull}28798[28798]
- Allow pprof endpoints for elastic-agent or beats if enabled. {pull}28983[28983] {pull}29155[29155]
- Add --fleet-server-es-ca-trusted-fingerprint flag to allow agent/fleet-server to work with elasticsearch clusters using self signed certs. {pull}29128[29128]
- Discover changes in Kubernetes nodes metadata as soon as they happen. {pull}23139[23139]
2 changes: 1 addition & 1 deletion x-pack/elastic-agent/_meta/config/common.p2.yml.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ inputs:
# metrics: true
# # exposes /debug/pprof/ endpoints
# # recommended that these endpoints are only enabled if the monitoring endpoint is set to localhost
# pprof: false
# pprof.enabled: false
# # exposes agent metrics using http, by default sockets and named pipes are used
# http:
# # enables http endpoint
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ inputs:
# metrics: false
# # exposes /debug/pprof/ endpoints
# # recommended that these endpoints are only enabled if the monitoring endpoint is set to localhost
# pprof: false
# pprof.enabled: false
# # exposes agent metrics using http, by default sockets and named pipes are used
# http:
# # enables http endpoint
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ inputs:
# metrics: false
# # exposes /debug/pprof/ endpoints
# # recommended that these endpoints are only enabled if the monitoring endpoint is set to localhost
# pprof: false
# pprof.enabled: false
# # exposes agent metrics using http, by default sockets and named pipes are used
# http:
# # enables http endpoint
Expand Down
42 changes: 42 additions & 0 deletions x-pack/elastic-agent/control.proto
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,19 @@ enum ActionStatus {
FAILURE = 1;
}

// pprof endpoint that can be requested.
enum PprofOption {
ALLOCS = 0;
BLOCK = 1;
CMDLINE = 2;
GOROUTINE = 3;
HEAP = 4;
MUTEX = 5;
PROFILE = 6;
THREADCREATE = 7;
TRACE = 8;
}

// Empty message.
message Empty {
}
Expand Down Expand Up @@ -128,6 +141,32 @@ message ProcMetaResponse {
repeated ProcMeta procs = 1;
}

// PprofRequest is a request for pprof data from and http/pprof endpoint.
message PprofRequest {
// The profiles that are requested
repeated PprofOption pprofType = 1;
// A string representing a time.Duration to apply to trace, and profile options.
string traceDuration = 2;
// The application that will be profiled, if empty all applications are profiled.
string appName = 3;
// The route key to match for profiling, if empty all are profiled.
string routeKey = 4;
}

// PprofResult is the result of a pprof request for a given application/route key.
message PprofResult {
string appName = 1;
string routeKey = 2;
PprofOption pprofType = 3;
bytes result = 4;
string error = 5;
}

// PprofResponse is a wrapper to return all pprof responses.
message PprofResponse {
repeated PprofResult results = 1;
}

service ElasticAgentControl {
// Fetches the currently running version of the Elastic Agent.
rpc Version(Empty) returns (VersionResponse);
Expand All @@ -143,4 +182,7 @@ service ElasticAgentControl {

// Gather all running process metadata.
rpc ProcMeta(Empty) returns (ProcMetaResponse);

// Gather requested pprof data from specified applications.
rpc Pprof(PprofRequest) returns (PprofResponse);
}
2 changes: 1 addition & 1 deletion x-pack/elastic-agent/elastic-agent.docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ inputs:
# metrics: false
# # exposes /debug/pprof/ endpoints
# # recommended that these endpoints are only enabled if the monitoring endpoint is set to localhost
# pprof: false
# pprof.enabled: false
# # exposes agent metrics using http, by default sockets and named pipes are used
# http:
# # enables http endpoint
Expand Down
2 changes: 1 addition & 1 deletion x-pack/elastic-agent/elastic-agent.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ inputs:
# metrics: false
# # exposes /debug/pprof/ endpoints
# # recommended that these endpoints are only enabled if the monitoring endpoint is set to localhost
# pprof: false
# pprof.enabled: false
# # exposes agent metrics using http, by default sockets and named pipes are used
# http:
# # enables http endpoint
Expand Down
2 changes: 1 addition & 1 deletion x-pack/elastic-agent/elastic-agent.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ inputs:
# metrics: true
# # exposes /debug/pprof/ endpoints
# # recommended that these endpoints are only enabled if the monitoring endpoint is set to localhost
# pprof: false
# pprof.enabled: false
# # exposes agent metrics using http, by default sockets and named pipes are used
# http:
# # enables http endpoint
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -197,9 +197,10 @@ func fleetToReader(agentInfo *info.AgentInfo, cfg *configuration.Configuration)
configToStore := map[string]interface{}{
"fleet": cfg.Fleet,
"agent": map[string]interface{}{
"id": agentInfo.AgentID(),
"logging.level": cfg.Settings.LoggingConfig.Level,
"monitoring.http": cfg.Settings.MonitoringConfig.HTTP,
"id": agentInfo.AgentID(),
"logging.level": cfg.Settings.LoggingConfig.Level,
"monitoring.http": cfg.Settings.MonitoringConfig.HTTP,
"monitoring.pprof": cfg.Settings.MonitoringConfig.Pprof,
},
}

Expand Down
Loading

0 comments on commit 2b4e354

Please sign in to comment.