Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SimpleConcurrentProcessor for Logs, clarify existing export concurrency #4163
Add SimpleConcurrentProcessor for Logs, clarify existing export concurrency #4163
Changes from 4 commits
f2b9c32
8ecc57b
71114f8
de530e8
109f817
8d70036
e445921
a40ab74
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly suggest NOT using the term "concurrent" as usually it conveys that there is some of synchronization behind. E.g. "concurrent collections" are collections that can be used in multithreaded code. In some languages there is an idiom that if the type has
Concurrent
in its name, the method calls are synchronized.My naming proposal is "Passthrough processor"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resonates with me too -- the use of "concurrent" suggests some kind of attention to concurrency, a limit of some sort in addition to offering concurrency. Maybe the name could be AsynchronousProcessor -- starts the export and returns success immediately.
I admit this worries me a bit -- is the exporter expected to implement some kind of limit, to avoid running out of memory? Can this be composed with the BatchSpanProcessor? What will control whether the BatchSpanProcessor drops spans vs. this processor consuming unlimited amounts of data?
Here, we have an OTel Collector processor that offers unlimited concurrency, but subject to a limit on the total amount of pending data, I consider it a potential solution to the problems posed above: https://github.com/open-telemetry/otel-arrow/tree/main/collector/processor/concurrentbatchprocessor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not the intend. The intended exporters here not just 'starts' the export. It does everything inline (i.e serialization, writing to destination.), and then return success/failure. (from what I know, such exporters are typically writing to ETW, user-events)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Such exporters (etw/user-events) does not buffer anything. The logRecord is serialized and handed over to destination inline. ETW/user-events system is backed by OperatingSystem kernel memory, and they have ample mechanisms to keep memory in control, but those are outside the scope of the exporter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cijothomas any thoughts about my naming proposal?
My other proposals are
"Direct processor" meaning that it directly calls the exporter.
"Adapter processor" meaning that it is only an adapter and has no logic/ synchronization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming is hard 🤣 I am not sure if any of the alternate suggestions are significantly better. I'll keep thinking and hope we get more suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot enforce, but the spec wording is to make sure any person authoring own exporter should follow this spec. If they don't follow, then there could be undesired behavior.
Alternatively, we can word it in such a way that "It must be documented to the exporter authors that they should document the exporters' concurrency characteristics....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was part of why it was went with that the processor should called all exporters the same and not have to worry about concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsloughter, the users could still use processors that does not require the exporters to be concurrent safe. But we would give a more performant option for cases where exporter are concurrent safe and can be used synchronously.
@cijothomas, the alternative could be that such exporter packages (like ETW or user_events exporters) would provide their own processors which are "tailored" to their exporters. It can give more flexibility and not increase the SDK functionality surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! That is exactly what we have been doing for years. But that does not mean spec shouldn't support them.
The increase in SDK functionality surface is necessary, to support such scenarios. The spec already has wording that state implementations must have simple and batch (meaning others are optional), so additional processors don't put extra burden on implementations, unless they chose to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is more (I have forgotten tot call it out explictly) I think that e.g. for ETW we can simply provide a single
EtwProcessor
which does the exporting. There is no need for batching or synchronization so there is no need to provide anExporter
interface implementation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is also possible. (In certain ways, OTel Rust does that. Its etwexporter is not really an exporter, but a ~processor).
But I think it is best to model exporters (the thing which does serialization, export telemetry to outside the process) as exporters itself consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really sure as these exporters are emitting batches. And for use cases like ETW and user_events you probably prefer to operate on "single" log record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are exporters! (They do the job of serializing and transferring the telemetry to an external entity.) Whether it gets a batch of 1 item or multiple is purely based on choice of processor used. (when used with SimpleProcessor, even OTLPExporter gets a batch of single item only).