Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support customization of throughput relevant settings in APM integration #26638

Closed
simitt opened this issue Jun 1, 2021 · 2 comments · Fixed by #27429
Closed

Support customization of throughput relevant settings in APM integration #26638

simitt opened this issue Jun 1, 2021 · 2 comments · Fixed by #27429
Assignees

Comments

@simitt
Copy link
Contributor

simitt commented Jun 1, 2021

APM Server standalone allows users to customize throughput relevant settings:

  • output.elasticsearch.worker, output.elasticsearch.bulk_max_size
  • queue.mem.events, queue.mem.flush.min_events

Elastic Agent with Fleet does not expose the memory queue related settings, but they are important to be aligned with the available resources to optimize processing while not running out of memory.
A discussed approach to support this is to make following changes in libbeat:

  • ensure that beats properly restart/reload when output related settings change (output.elasticsearch.worker, output.elasticsearch.bulk_max_size)

  • introduce a factor F that can be set per beat (not user editable);

    • F=1: lowest throughput, no load balancing.
    • F = (W + R) with W = len(output.*.hosts) * worker. R sets how many batches can be ready for outputs still waiting for ACKs. R=Whas higher memory usage, but can keep the outputs/queues more saturated improving throughput (otherwise it has to wait for the next N events to be acquired). For reducing memory usage set R=0 or R=1. If dealing with bursts of events, F=1 might make sense. Otherwise F=2*W could be a good default.
  • translate libbeat internal queue settings from output relevant settings and F:

    • queue.mem.flush.min_events = bulk_max_size
    • queue.mem.events = bulk_max_size * F
    • set a fixed value for queue.mem.flush.timeout

cc @urso

@simitt simitt transferred this issue from elastic/apm-server Jul 1, 2021
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 1, 2021
@simitt simitt added the Team:Elastic-Agent Label for the Agent team label Jul 1, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 1, 2021
@urso
Copy link

urso commented Jul 7, 2021

set a fixed value for queue.mem.flush.timeout

I think it depends on the output. For ES/Redis outputs we can introduce a timeout setting with the output.

The Kafka output has it's own buffering and timeout setting. Maybe we just want to stream events through by setting timeout to 0 (we would need to test if that is beneficial or not).

For LS I would love to have an event streaming protocol without need for batches, yet lumberjack uses batches => a flush timeout makes sense here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants