Support customization of throughput relevant settings in APM integration #26638

simitt · 2021-06-01T07:15:01Z

APM Server standalone allows users to customize throughput relevant settings:

output.elasticsearch.worker, output.elasticsearch.bulk_max_size
queue.mem.events, queue.mem.flush.min_events

Elastic Agent with Fleet does not expose the memory queue related settings, but they are important to be aligned with the available resources to optimize processing while not running out of memory.
A discussed approach to support this is to make following changes in libbeat:

ensure that beats properly restart/reload when output related settings change (output.elasticsearch.worker, output.elasticsearch.bulk_max_size)
introduce a factor F that can be set per beat (not user editable);
- F=1: lowest throughput, no load balancing.
- F = (W + R) with W = len(output.*.hosts) * worker. R sets how many batches can be ready for outputs still waiting for ACKs. R=Whas higher memory usage, but can keep the outputs/queues more saturated improving throughput (otherwise it has to wait for the next N events to be acquired). For reducing memory usage set R=0 or R=1. If dealing with bursts of events, F=1 might make sense. Otherwise F=2*W could be a good default.
translate libbeat internal queue settings from output relevant settings and F:
- queue.mem.flush.min_events = bulk_max_size
- queue.mem.events = bulk_max_size * F
- set a fixed value for queue.mem.flush.timeout

cc @urso

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-07-01T08:19:44Z

Pinging @elastic/agent (Team:Agent)

urso · 2021-07-07T08:32:16Z

set a fixed value for queue.mem.flush.timeout

I think it depends on the output. For ES/Redis outputs we can introduce a timeout setting with the output.

The Kafka output has it's own buffering and timeout setting. Maybe we just want to stream events through by setting timeout to 0 (we would need to test if that is beneficial or not).

For LS I would love to have an event streaming protocol without need for batches, yet lumberjack uses batches => a flush timeout makes sense here as well.

simitt transferred this issue from elastic/apm-server Jul 1, 2021

simitt added the 7.15-candidate label Jul 1, 2021

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 1, 2021

simitt added the Team:Elastic-Agent Label for the Agent team label Jul 1, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 1, 2021

simitt mentioned this issue Jul 1, 2021

[meta] APM Server managed by Elastic Agent with Fleet (GA) elastic/apm-server#4636

Closed

16 tasks

mostlyjason mentioned this issue Jul 8, 2021

[Fleet][POC] Add multiple ES outputs elastic/kibana#104980

Closed

6 tasks

simitt mentioned this issue Jul 9, 2021

[Elastic Agent] Support memory queue configuration #23993

Closed

nimarezainia added the apm-server label Jul 12, 2021

mostlyjason mentioned this issue Jul 19, 2021

[Fleet] Add output performance tuning settings to agent policy elastic/kibana#106086

Closed

andresrc assigned faec Jul 28, 2021

faec mentioned this issue Aug 17, 2021

Inject inferred queue settings into agent config generation #27429

Merged

6 tasks

faec closed this as completed in #27429 Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support customization of throughput relevant settings in APM integration #26638

Support customization of throughput relevant settings in APM integration #26638

simitt commented Jun 1, 2021 •

edited

Loading

elasticmachine commented Jul 1, 2021

urso commented Jul 7, 2021

Support customization of throughput relevant settings in APM integration #26638

Support customization of throughput relevant settings in APM integration #26638

Comments

simitt commented Jun 1, 2021 • edited Loading

elasticmachine commented Jul 1, 2021

urso commented Jul 7, 2021

simitt commented Jun 1, 2021 •

edited

Loading