Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleActivityProcessor improvements. #896

Closed

Conversation

CodeBlanch
Copy link
Member

Issue

Trying to fix a couple issues with the simple (hey it tries hard, give it a chance) processor:

  • The spec says:

    Export() will never be called concurrently for the same exporter instance.
    Export() can be called again only after the current call returns.
    

    We're currently calling it concurrently, almost aggressively.

  • We throw every span at the thread pool. We'll steal a lot of threads the hosting process needs for its work doing that.

Design

A background thread is sleeping until it is told there is work. Once it is signaled it will tight-loop export spans until there is no more work. In that tight loop it will batch what it can. I'm thinking it is too expensive to export one-by-one when we know there is more data, even though the simple processor isn't supposed to technically batch.

Opened as a draft because I'm still working on tests, but I wanted to get feedback.


Interlocked.Increment(ref this.currentQueueSize);

this.activityQueue.Enqueue(activity);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd let @reyang comment here, but introducing a queue for simpleprocessor defeats the purpose it was intended, won't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was bringing this topic to the OpenTelemetry Specification SIG meeting 07/14/2020 as topic 8.

Have seen the challenge on .NET, C++ and other scenarios/languages that require high performance / concurrency. For example, an exporter that concurrently writes data to shared memory, ETW (event tracing for Windows) or LTTng.

This is something I need to work on from the spec perspective.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to ideas, but it feels unavoidable. How we had it before, starting Tasks, that's just queuing work for the thread pool, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current thinking:

  1. simple processor should run concurrently without contention.
  2. exporter could have synchronization by default, following the spec (unless I changed the spec before GA).
  3. we probably need a way for certain exporter to express that they are "thread free", similar like the COM STA/MTA model, so that the SDK won't try to synchronize the call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this one might be tricky. I agree with you guys, 100%, but the current version I don't think is passable. It is going to flood the thread pool under load. Can we move forward with this more safe design and pursue clarity with the spec for GA? Keep in mind, the simple processor is currently the default.

If an exporter is fast enough, this will export spans one-by-one as they are ready. Tight inner loop. It is only when the exporter is slow that we start feeding it chunks of data. Once our queue fills up, we start dropping data. It's a more safe approach all around IMO.

Satisfies the spec mandate that we shouldn't call export concurrently, but also more important mandates like we shouldn't starve the hosting process 🍽️

Copy link
Member

@reyang reyang Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that we move forward with this direction.
The Python SDK is actually using lock inside exporter, which means the exporter code is called concurrently.
The C++ SDK is also hitting the same concern and that's why I raised the question in the specification SIG.

Give me some time to do a quick experiment and I might come up with some solution. Currently I got lots of ideas:

  1. [MethodImpl(MethodImplOptions.Synchronized)]
  2. having both sync and async export interface and provide helper method to smooth it out
  3. pushing the spec to make a change
  4. move forward on this PR, and have a separate guidance "how to write high performance / concurrency exporter not using the exporter (but processor) interface" (which seems to be 😈)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, looking forward to seeing what you come up with!

I kind of dig option 4. We provide a couple of OOB solutions geared towards best-effort, safety, and low-impact on the hosting process but if you want to make something ultra performant you can cut out the exporting layer completely and do it directly in the processor.

3 we should probably do regardless? At least get clarification on how it should work.

1 I'm skeptical that any kind of synchronization will be successful. I'm imagining a busy process creating a lof of spans very quickly with a slow exporter.

2 I thought about changing the interface when I was doing the work but it didn't really help the situation. The way it is written today (async) with the fire-and-forget we'll flood the pool. If it was sync, we'd block the SpanProcessor.EndActivity from finishing which holds up that thread. Neither case is really ideal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this design and maybe we can port some of the work to the batching exporter. Using the handles to stop the thread constantly checking for work is nice.

I also agree with @cijothomas and @reyang that the simple processor is specced to not batch spans, the simple span exporter is effectively designed to be a simple queue that can be exhausted fairly easily. It is strange that the default will naturally lack performance and we should pursue amending the spec to default to a batching style exporter.

Copy link
Member Author

@CodeBlanch CodeBlanch Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyang Ping 😄

The main thing I want to accomplish here is removing the fire-and-forget task/thread pool thing. That is soooo dangerous! In a high-volume/slow exporter situation, I'm pretty sure that will crash the process. If we want to remove the batching in the worker thread, no problem. It seems like a crime to not batch up the data when we know it's sitting there, but 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeBlanch this has a dependency on the refactor work, I think we should be ready to solve it on Wed.

@CodeBlanch
Copy link
Member Author

Closing in favor of plan on #1078

@CodeBlanch CodeBlanch closed this Aug 14, 2020
@CodeBlanch CodeBlanch deleted the SimpleActivityProcessor branch August 19, 2020 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants