-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ingest Manager] Expose processes and their metrics #24788
Conversation
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪💚 Flaky test reportTests succeeded. Expand to view the summary
Test stats 🧪
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @michalpristas ! Checked out your PR and it generally works as described. Left some minor comments.
The processes
endpoints need to be exposed via a TCP port though, so that the information can be queried from other containers via http request. The port needs to be configurable. It's fine to do this in a follow-up PR, but it is a requirement for cloud to be able to collect the information (required for 7.13
).
) | ||
|
||
const ( | ||
procuctIDKey = "processID" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR uses product
, program
and process
at different places for logic dealing with processes
. For consistency reasons, I think we should aim for always using process
, which reduces the mental overhead.
metricsBytes, metricsErr := processMetrics(r.Context(), id) | ||
if metricsErr != nil { | ||
resp := errResponse{ | ||
Type: "UNEXPECTED", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used a couple of times, maybe worth introducing a typeUnexpected
i will make it exposed using configurable TCP in this PR, this is still a draft made in a way so you can pick it up as soon as possible. @simitt is it ok to make it configurable in a way that when configuration is missing it wont be exposed so i dont use ports when i dont need to? is cloud capable of updating this port before agent start? or do you need static port from the start |
+1 on having it disabled by the default. Will this be configurable through Fleet or not? The current preference would be that the person running Elastic Agent can set this and not necessarily available in Fleet. |
as is now it's not configurable from fleet |
@simitt updated solution with HTTP endpoint, processes wont be exposed unless As we did not received any feedback from cloud just yet, only |
thanks @michalpristas; I'll give it a try as soon as possible |
I set this up on port 81.
For
I was a bit surprised that Every time the |
# # process stats are exposed only using this option | ||
# # it is up to a caller to make sure port is usable and free to use. | ||
# # by default 0 is used meaning socket is used instead. | ||
# port: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we use the same config blocks as we have in beats: https://github.com/elastic/beats/blob/master/filebeat/filebeat.reference.yml#L2507
We need a host
to decide if it should only be exposed on localhost or broader. This also allows to add the enabled
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, at least for when it is run in docker localhost
would not be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with http.enabled/host/port options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. @simitt I assume on Cloud we can just use this in the template and make the port configurable or hardcode.
Pinging @elastic/agent (Team:Agent) |
# # When using IP addresses, it is recommended to only use localhost. | ||
# host: localhost | ||
# # Port on which the HTTP endpoint will bind. Default is 0 meaning feature is disabled. | ||
# port: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we have enabled / disabled
now, do we still need the support for 0 ?
return | ||
} | ||
|
||
fmt.Fprint(w, string(metricsBytes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I normally like is that metrics endpoint are also human readable. What I mean in the context here is that we pretty print the json. Unfortunately this means in this context to convert it first to json to be able to pretty print it with indentation. At the same time, should not cause too much overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you specify pretty print with metricbeat? we can pass argument to mb if passed to agent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember there is a ?pretty
flag: /stats/?pretty
. Not sure if that works over the socket. I was initially thinking to implement it here so it works for all the outputs also agent, but not strong preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pretty was a special flag provided by the expvar handler. Not sure we have had it implemented in Beats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this is not a blocker, please ignore it for now.
@michalpristas How do I reach the metric data from elastic-agent itself? |
|
type MonitoringHTTPConfig struct { | ||
Enabled bool `yaml:"enabled" config:"enabled"` | ||
Host string `yaml:"host" config:"host"` | ||
Port int `yaml:"port" config:"port"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a 'positive' validator (see unpack) docs. Then if port is configured, but empty we will fail to parse the configuration and fail with the setting that failed. The Enabled
will be used to not start the server.
func processMetrics(ctx context.Context, id string) ([]byte, int, error) { | ||
detail, err := parseID(id) | ||
if err != nil { | ||
return nil, http.StatusInternalServerError, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no internal error, but the user did provide invalid input.
return | ||
} | ||
|
||
fmt.Fprint(w, string(metricsBytes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pretty was a special flag provided by the expvar handler. Not sure we have had it implemented in Beats.
Length limit is 104 on unix.
Sync legacy apm ingest folder to HOME dir.
* Add baseline ECS 1.9.0 upgrade * update changelog
* feat: stage execution cache * fix: use correct context * fix: do not check stage status on the first run * fix: proper URL * chore: show message when the stache is skip * fix: correct path * fix: add final / * test: is the path needed? * fix: remove prefix * chore: refactor to use curl to download * chore: use pipeline step
…c#24904) * Add check for URL set when cert and cert key. * Add changelog.
…o expose-processes
return nil, 0, errorWithStatus(http.StatusInternalServerError, err) | ||
} | ||
|
||
return rb, resp.StatusCode, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can StatusCode be != 200 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont know, but i would rather proxy whatever is retrieved from beat than mix 200 with error message
[Ingest Manager] Expose processes and their metrics (elastic#24788)
* upstream/master: (308 commits) [winlogbeat] Add support for sysmon v13 events 24 and 25 (elastic#24945) mergify: add backport label (elastic#25050) Add pod.ip in k8s metadata (elastic#25037) [elastic-agent] Use fleet.url for container cmd (elastic#25026) disable TestXPackEnabled flaky test in logstash metricbeat module (elastic#25034) Leverege leader election in agent k8s manifests (elastic#25016) libbeat/publisher/pipeline: expand monitoring (elastic#24700) libbeat: fix decode_json_fields config validation (elastic#24862) Remove make docs-preview instructions (elastic#25001) [Filebeat] Fix IPtables pipeline (elastic#24928) [DOCS] cd into correct directory before invoking mage. (elastic#17679) Add -buildmode=pie for supported platform (elastic#24964) Add agent's direcotry in k8s manifest generator (elastic#24987) [mergify] assign the original author (elastic#25007) Fix AWS module flaky tests (elastic#24852) [filebeat] Use fail_on_template_error on google_workspace and okta pagination (elastic#24967) Updated config to match defaults (elastic#25004) [Filebeat] Fix hardcoded amazonaws.com endpoint (elastic#24861) Add cloud.service.name to add_cloud_metadata (elastic#24993) [Ingest Manager] Expose processes and their metrics (elastic#24788) ...
What does this PR do?
Added
/processes
and/processes/{processID}
endpoints to http server.agent has its server on
unix:///tmp/elastic-agent/elastic-agent.sock
ornpipe:///elastic-agent
for windows.not configurable
example of
/processes
example of
/processes/metricbeat-default
in case of error e.g
Why is it important?
Fixes: #24091
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.