Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: Concurrent Batch Processor #33422

Open
3 tasks
moh-osman3 opened this issue Jun 7, 2024 · 1 comment
Open
3 tasks

New component: Concurrent Batch Processor #33422

moh-osman3 opened this issue Jun 7, 2024 · 1 comment
Labels
needs triage New item requiring triage Sponsor Needed New component seeking sponsor Stale

Comments

@moh-osman3
Copy link

moh-osman3 commented Jun 7, 2024

The purpose and use-cases of the new component

This component is an experimental processor, forked from the core
OpenTelemetry Collector batchprocessor
component
.

This component enhances the batchprocessor with

  1. Synchronous pipeline support: this component blocks each producer
    until the request returns with success or an error status code.
  2. Maximum in-flight-bytes setting. This component measures the
    in-memory size of each request it admits to the pipeline and
    otherwise stalls requests until they timeout.
  3. Unlimited concurrency: this component will start as many goroutines
    as needed to send batches through the pipeline.

This processor should be used to

  • Propagate errors back to producers. Producers will block until all its items are batched, exported and a success/error is returned back to the producer. If multiple producers are in a single batch then all producers will see the same error for related to that specific export. If a single producer has items across multiple batches and any of those batches fail, then that producer will see an error. In the case of fanout to multiple exporters, if any of these exporters return an error, then the producer will see an error. This error propogation is useful because returning success to producers despite failure in downstream components can be misleading and cause confusion when monitoring pipeline health.
  • Reduce bottlenecking in batch processor by exporting batches concurrently. The original batch processor processes incoming requests asynchronously and will not process new requests until the current batch is exported and returns. The concurrentbatchprocessor exports batches in a separate goroutine. This means processing incoming requests can continue and new batches can be formed and exported without waiting on the previous batch export to complete. This removes the need for enabling queuing in the exporterhelper.
  • Add a memory limiting mechanism based on in-flight-bytes to control admission of requests into the processor. This is useful to apply backpressure and block requests from being admitted into the processor until there is enough available memory. This in flight byte limit only counts the uncompressed size of the incoming request and does not account for any additional allocations and memory held while processing the request. Note that it is still possible to experience high memory usage in previous components before this limit is applied.

Example configuration for the component

    processors:
      concurrentbatch:
        send_batch_max_size: 1500
        send_batch_size: 1000
        timeout: 1s
        max_in_flight_size_mib: 128

Telemetry data types supported

Traces, metrics, and logs supported.

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am a member of the OpenTelemetry organization.
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

@jmacd, @moh-osman3

Sponsor (optional)

No response

Additional context

This component currently lives in the otel-arrow repository https://github.com/open-telemetry/otel-arrow/tree/main/collector/processor/concurrentbatchprocessor. Migrating this component to contrib might be helpful for other collector users who are experiencing issues with exporterhelper's lack of backpressure, error propagation, and high memory usage. This has been used in production for the past 6 months now and used in our arrow collector pipelines with the Otel Arrow receiver and exporter. This has helped us improve issues faced when using exporterhelper with queueing enabled.

@moh-osman3 moh-osman3 added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels Jun 7, 2024
Copy link
Contributor

github-actions bot commented Aug 7, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage New item requiring triage Sponsor Needed New component seeking sponsor Stale
Projects
None yet
Development

No branches or pull requests

1 participant