-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SimpleActivityProcessor improvements. #896
SimpleActivityProcessor improvements. #896
Conversation
|
||
Interlocked.Increment(ref this.currentQueueSize); | ||
|
||
this.activityQueue.Enqueue(activity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd let @reyang comment here, but introducing a queue for simpleprocessor defeats the purpose it was intended, won't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was bringing this topic to the OpenTelemetry Specification SIG meeting 07/14/2020 as topic 8.
Have seen the challenge on .NET, C++ and other scenarios/languages that require high performance / concurrency. For example, an exporter that concurrently writes data to shared memory, ETW (event tracing for Windows) or LTTng.
This is something I need to work on from the spec perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open to ideas, but it feels unavoidable. How we had it before, starting Tasks, that's just queuing work for the thread pool, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current thinking:
- simple processor should run concurrently without contention.
- exporter could have synchronization by default, following the spec (unless I changed the spec before GA).
- we probably need a way for certain exporter to express that they are "thread free", similar like the COM STA/MTA model, so that the SDK won't try to synchronize the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this one might be tricky. I agree with you guys, 100%, but the current version I don't think is passable. It is going to flood the thread pool under load. Can we move forward with this more safe design and pursue clarity with the spec for GA? Keep in mind, the simple processor is currently the default.
If an exporter is fast enough, this will export spans one-by-one as they are ready. Tight inner loop. It is only when the exporter is slow that we start feeding it chunks of data. Once our queue fills up, we start dropping data. It's a more safe approach all around IMO.
Satisfies the spec mandate that we shouldn't call export concurrently, but also more important mandates like we shouldn't starve the hosting process 🍽️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that we move forward with this direction.
The Python SDK is actually using lock inside exporter, which means the exporter code is called concurrently.
The C++ SDK is also hitting the same concern and that's why I raised the question in the specification SIG.
Give me some time to do a quick experiment and I might come up with some solution. Currently I got lots of ideas:
[MethodImpl(MethodImplOptions.Synchronized)]
- having both sync and async export interface and provide helper method to smooth it out
- pushing the spec to make a change
- move forward on this PR, and have a separate guidance "how to write high performance / concurrency exporter not using the exporter (but processor) interface" (which seems to be 😈)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, looking forward to seeing what you come up with!
I kind of dig option 4. We provide a couple of OOB solutions geared towards best-effort, safety, and low-impact on the hosting process but if you want to make something ultra performant you can cut out the exporting layer completely and do it directly in the processor.
3 we should probably do regardless? At least get clarification on how it should work.
1 I'm skeptical that any kind of synchronization will be successful. I'm imagining a busy process creating a lof of spans very quickly with a slow exporter.
2 I thought about changing the interface when I was doing the work but it didn't really help the situation. The way it is written today (async) with the fire-and-forget we'll flood the pool. If it was sync, we'd block the SpanProcessor.EndActivity from finishing which holds up that thread. Neither case is really ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this design and maybe we can port some of the work to the batching exporter. Using the handles to stop the thread constantly checking for work is nice.
I also agree with @cijothomas and @reyang that the simple processor is specced to not batch spans, the simple span exporter is effectively designed to be a simple queue that can be exhausted fairly easily. It is strange that the default will naturally lack performance and we should pursue amending the spec to default to a batching style exporter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reyang Ping 😄
The main thing I want to accomplish here is removing the fire-and-forget task/thread pool thing. That is soooo dangerous! In a high-volume/slow exporter situation, I'm pretty sure that will crash the process. If we want to remove the batching in the worker thread, no problem. It seems like a crime to not batch up the data when we know it's sitting there, but 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeBlanch this has a dependency on the refactor work, I think we should be ready to solve it on Wed.
Closing in favor of plan on #1078 |
Issue
Trying to fix a couple issues with the simple (hey it tries hard, give it a chance) processor:
The spec says:
We're currently calling it concurrently, almost aggressively.
We throw every span at the thread pool. We'll steal a lot of threads the hosting process needs for its work doing that.
Design
A background thread is sleeping until it is told there is work. Once it is signaled it will tight-loop export spans until there is no more work. In that tight loop it will batch what it can. I'm thinking it is too expensive to export one-by-one when we know there is more data, even though the simple processor isn't supposed to technically batch.
Opened as a draft because I'm still working on tests, but I wanted to get feedback.