[async hooks] proposal for standard CLS API - request for feedback #345

puzpuzpuz · 2019-12-18T11:32:34Z

Hi guys,

I believe that upcoming executionAsyncResource() function (nodejs/node#30959) will allow building a simple and robust CLS API as a part of async_hooks. I've created an experimental PR that shows how this API could look like: nodejs/node#31016

Early reviews and feedback from you would be very valuable.

The text was updated successfully, but these errors were encountered:

Qard · 2019-12-19T03:23:25Z

There's also nodejs/node#26540, which is currently under active development too. It's worth comparing the two to consider the flexibility and performance of each. It's a fairly performance-sensitive area, and is prone to memory issues. We should consider carefully what our needs are from this API.

puzpuzpuz · 2019-12-19T04:42:18Z

@Qard
I'm aware of nodejs/node#26540 (and of nodejs/node#27172). I have described design ideas in that PR, but as nobody was interested I decided to implement them myself.

The description section in my PR has an complete overview, but let me compare my implementation with nodejs/node#26540 here.

Pros of proposed API:

Safer memory-wise:
- Does not depend on destroy hook, thus doesn't lead to mem leak in misbehaving AsyncResource scenarios.
- Allows complete disposal of the underlying hook callback and all values by calling .remove() method. Thus, you don't have to restart your node process to remove CLS or replace it with another one.
Should show at least same performance.
- Deals with less data structures internally and does less operations in overall.
- If you're interested in benchmark comparison, I'm willing to write a benchmark and compare both implementations. Just point me at a good example of microbenchmark (or benchmark) that is considered good enough by node core team.
- Update: see [async hooks] proposal for standard CLS API - request for feedback #345 (comment) for benchmark results.
Has simpler API, yet provides same features.
- Does not have .enter() method. API is only 2 mandatory (.get()/.set(value)) and one optional (.remove()) methods.
Has copy-on-write semantics.
- I believe that this is a very important property. As a real world example, without this property implementing a request id tracing library based on CLS API (like this one: https://github.com/puzpuzpuz/cls-rtracer) is very tricky.

Cons of proposed API:

The only disadvantage I know is that .set(value) method has to be called outside of application bootstrap (that's inherited from executionAsyncResource()). But it's not critical at all for a CLS API.
- Update: looks like this restriction will be removed. See this commit: nodejs/node@793282f

Of course, I'm prepared for my PR to be totally ignored, as it was with nodejs/node#27172.

Flarna · 2019-12-19T13:09:51Z

As already mentioned in nodejs/node#26540 the async enter() is a blocker for using it in in my case. Also the missing propagation for async functions is a problem.
As nodejs/node#26540 tends to use propagation using the trigger path it would require a triggeringAsyncResource() api to avoid the complex implementation using weakrefs,...

What I miss in general in both CLS APIs is the ability to stop propagation or avoid it for some case. e.g. it's formal correct that a context is propagated forever in case of e.g. a setInterval but practically this is a leak and not what most users want.

Besides that I miss functionality to allow to trigger other side effects on context change. See for example how domains use async hooks - in principle it's a CLS but they have the need to do something on context change. So both proposed CLS APIs are able to replace the use of async hooks in domains.

puzpuzpuz · 2019-12-19T13:34:15Z

@Flarna

What I miss in general in both CLS APIs is the ability to stop propagation or avoid it for some case. e.g. it's formal correct that a context is propagated forever in case of e.g. a setInterval but practically this is a leak and not what most users want.

In case of setInterval, the .remove() method will help as it stops propagation and removes all stored values in the AsyncLocal. It's not the same as stopping propagation of a particular async call chain branch, but nothing prevents for such API to be added. Say, a separate WeakSet could be introduce. It would store all resources for which the propagation must stop.

Besides that I miss functionality to allow to trigger other side effects on context change. See for example how domains use async hooks - in principle it's a CLS but they have the need to do something on context change. So both proposed CLS APIs are able to replace the use of async hooks in domains.

That sounds like an advanced API to me, which is not required in my use cases. But if there will be multiple votes to include it into the API, I could implement it in a similar manner as in your implementation.

Flarna · 2019-12-19T13:50:14Z

In case of setInterval, the .remove() method will help as it stops propagation and removes all stored values in the AsyncLocal.

This requires an extra, manual call at all these locations which is exactly the opposite of a "just works" CLS as you have to clutter your application code with hints towards CLS instead of one time configuration.

puzpuzpuz · 2019-12-19T13:55:43Z

This requires an extra, manual call at all these locations which is exactly the opposite of a "just works" CLS as you have to clutter your application code with hints towards CLS instead of one time configuration.

I can hardly believe in existence of "just works" stop-context-propagation configuration API. Exact logic heavily depends on the application needs.

But I believe that having a way to completely stop the propagation and free the memory is a must for a CLS API.

puzpuzpuz · 2019-12-19T16:19:40Z

I have modified async-resource-vs-destroy.js benchmark and compared proposed implementation with #26540. Here is the result which I got on my machine:

$ ./node benchmark/async_hooks/async-resource-vs-destroy.js benchmarker=autocannon
async_hooks/async-resource-vs-destroy.js n=1000000 method="callbacks" type="async-local" benchmarker="autocannon": 20,277.6
async_hooks/async-resource-vs-destroy.js n=1000000 method="async" type="async-local" benchmarker="autocannon": 15,877.6
async_hooks/async-resource-vs-destroy.js n=1000000 method="callbacks" type="async-context" benchmarker="autocannon": 16,922.41
async_hooks/async-resource-vs-destroy.js n=1000000 method="async" type="async-context" benchmarker="autocannon": 11,841.2
async_hooks/async-resource-vs-destroy.js n=1000000 method="callbacks" type="async-resource" benchmarker="autocannon": 23,582.4
async_hooks/async-resource-vs-destroy.js n=1000000 method="async" type="async-resource" benchmarker="autocannon": 18,407.2

As expected, AsyncLocal is significantly faster than AsyncContext, but slower than the implementation that stores values as resource object properties.

vdeturckheim · 2019-12-20T16:03:26Z

@puzpuzpuz is the call to destroy the main point of slowlyness in AsyncContext? As mentionned in the PR, it aims at being updated when @Qard 's PR is merged to not use it anymore.

puzpuzpuz · 2019-12-20T16:14:26Z

@puzpuzpuz is the call to destroyz the main point of slowlyness in AsyncContext`? As mentionned in the PR, it aims at being updated when @Qard 's PR is merged to not use it anymore.

Probably, but it's hard to tell for sure. Their public APIs are quite different and AsyncContext seems to be using more data structures and doing more stuff under the hood.

vdeturckheim · 2019-12-20T16:19:19Z

@puzpuzpuz actually this whole part would go away eventuallyhttps://github.com/nodejs/node/pull/26540/files#diff-0bb01a51b135a5f68d93540808bac801R201-R214
https://github.com/nodejs/node/pull/26540/files#diff-0bb01a51b135a5f68d93540808bac801R227-R229

In the end, the only data structure that would remain would be the map used for holding context. I would like to keep it as it is closer to the original CLS implementation that most APM vendors have copied over the years.

I am pretty opinionated on the instanciation of only one Async Hook for the whole process as I find it faster than having multiple ones.

puzpuzpuz · 2019-12-20T17:04:36Z

@vdeturckheim

@puzpuzpuz actually this whole part would go away eventuallyhttps://github.com/nodejs/node/pull/26540/files#diff-0bb01a51b135a5f68d93540808bac801R201-R214
https://github.com/nodejs/node/pull/26540/files#diff-0bb01a51b135a5f68d93540808bac801R227-R229

In the end, the only data structure that would remain would be the map used for holding context.

Yes, it will remove one Map internally and only one WeakMap + one Map + one Set will remain.

However, this change won't add copy-on-write semantics for AsyncContext. Do you happen to know how to implement a request id tracing (which is one of the most popular use cases for CLS APIs) once AsyncContext gets that change?

With AsyncLocal it's as simple as the following example: https://github.com/nodejs/node/pull/31016/files#diff-9a4649f1c3f167b0da2c95fc38953a1fR601

I would like to keep it as it is closer to the original CLS implementation that most APM vendors have copied over the years.

Nothing prevents APM vendors or 3rd-party library authors to build a thin-wrapper that provide a Map-like on top of an AsyncLocal. Such implementations will be simple and straightforward.

On the other hand, users who want to store a single value in CLS API won't have to pay for an additional Map.

I am pretty opinionated on the instanciation of only one Async Hook for the whole process as I find it fater than having multiple ones.

AsyncContext's init hook has a for loop, so it's still O(N), just as with N different AsyncLocals. AsyncLocal's init hook can be made global without any changes in the public API. But most of real world applications have a single AsyncLocal per the whole node process created by an APM agent or a request tracing library. So, keeping AsyncLocals isolated from each other seem to be more valuable to me.

vdeturckheim · 2019-12-20T17:39:18Z

Yes, it will remove one Map internally and only one WeakMap + one Map + one Set will remain.

The WeakMap is already gone ;)

Do you happen to know how to implement a request id tracing (which is one of the most popular use cases for CLS APIs) once AsyncContext gets that change?

I am not sure I get the quesiton, I would simply do a run and add the id in the map under a named key

Nothing prevents APM vendors or 3rd-party library authors to build a thin-wrapper that provide a Map-like on top of an AsyncLocal. Such implementations will be simple and straightforward.

Fair point, but a few comments on the PR actually asked how close it is to current existing implementation. IMO this is a design goal to rely on something that is already used.

On the other hand, users who want to store a single value in CLS API won't have to pay for an additional Map.

I don't think this Map is expensive here.

AsyncContext's init hook has a for loop, so it's still O(N), just as with N different AsyncLocals. AsyncLocal's init hook can be made global without any changes in the public API. But most of real world applications have a single AsyncLocal per the whole node process created by an APM agent or a request tracing library. So, keeping AsyncLocals isolated from each other seem to be more valuable to me.

TBH, this choice is mostly based on nodejs/node#16222 (comment).

but this is probably more discussions to have in the respective PRs.

puzpuzpuz · 2019-12-20T17:58:53Z

@vdeturckheim

The WeakMap is already gone ;)

Oh, I've missed that change.

So, it puts the Map into resources directly now. In that case, it won't be possible to get rid of AsyncContext completely (a la AsyncLocal.remove), except for restarting the node process.

I am not sure I get the quesiton, I would simply do a run and add the id in the map under a named key

A AsyncContext.run per each HTTP request then? Thus, plus one Map and one process.nextTick per each request. Doesn't sound like something that is good for performance.

Fair point, but a few comments on the PR actually asked how close it is to current existing implementation. IMO this is a design goal to rely on something that is already used.

As far as I understand, main goal for core APIs is to provide flexible (and sometimes low-level) APIs that can be used directly or used to build user-land libraries on top of them. But maybe this time it's an exception to the rule.

I don't think this Map is expensive here.

I believe that for core APIs, every data structure counts.

TBH, this choice is mostly based on nodejs/node#16222 (comment).

but this is probably more discussions to have in the respective PRs.

Not sure if that comment is applicable to CLS API. Anyway, it can be easily changed in both directions without any changes in the external behavior.

Flarna · 2019-12-20T22:47:50Z

I would like to keep it as it is closer to the original CLS implementation that most APM vendors have copied over the years.

@vdeturckheim May I ask which APM vendor implemenations you refer to?

My search shows different implemenations:
New Relic: In before hook they set their context and in the after hook they restore the old one.

DataDog: In before hook they set their context by a sync enter call and in after hook the clear it.

Elastic: This is different, they do a lookup based on executionAsyncId() in their map.

OpenTelemetry: There a with() function is used to execut a function synchron after setting the context taken from a map using executionAsyncId() and restore the previous value

I'm working at dynatrace (unfortunatelly our agent is not open source so no link 😢) and we use mostly the same mechanism as NewRelic/DataDog.

There is cls-hooked which has an API which reminds me to AsyncStorage. I think this lib is usually used by application programmers but not APMs. APMs just share the idea and the mapping to async_hooks.

As there was some link above towards the ticket to use async_hooks in domains. Would it be possible to port Domains to use the proposed CLS system to have a proove of concept?

puzpuzpuz · 2019-12-21T07:05:43Z

@Flarna

As there was some link above towards the ticket to use async_hooks in domains. Would it be possible to port Domains to use the proposed CLS system to have a proove of concept?

As far as I know, porting domains to AsyncLocal will require value change callback feature from CLS API. We already discussed this topic here: #345 (comment)

Nothing prevents from adding such advanced API into the proposed implementation, say, as a port of similar functionality from your PR (nodejs/node#27172). That could be easily done in future or in initial implementation - depends on feedback from async_hooks/diagnostics groups.

Qard · 2019-12-22T18:54:12Z

I think it should be a goal to have an API that can support multiple consumers and just work. So many times I've seen companies try to run multiple APM vendors in production and it crashes their app because they conflict with each other. It's very difficult coding defensively enough to not break when another APM vendor does something unexpected to the environment--a common thing I've seen is regular properties redefined with Object.defineProperty(...) and no longer being configurable.

puzpuzpuz · 2019-12-22T19:29:52Z

@Qard

I think it should be a goal to have an API that can support multiple consumers and just work. So many times I've seen companies try to run multiple APM vendors in production and it crashes their app because they conflict with each other.

Thanks for bringing this important concern up. In the proposed implementation AsyncLocals are isolated from each other: they don't have any shared data structures and use their own hooks. But in general, it's very hard to protect from monkey patching of public built-in APIs.

Qard · 2019-12-23T04:52:17Z

I think each having their own hooks is actually a minor point against them because of the additional overhead. The less repetitive code paths we can introduce in shared logic the better, while still retaining functional isolation.

puzpuzpuz · 2019-12-23T06:19:26Z

@Qard

I think each having their own hooks is actually a minor point against them because of the additional overhead.

Could you tell me what is the reason behind that overhead? Extra context switches between native code and JS or it's something different?

It's a trivial change to do, but before doing that I'd like to understand how critical it is.

Qard · 2019-12-23T06:59:43Z

Well, multiple hooks should be merged together on JS-side already, in async_hooks internals, but that's an implementation detail. This does however mean that there will be separate hook objects with separate associated event handlers, each with their own closing scope. Given that this is a very hot path, it would be best to minimize that as much as possible.

APM vendors typically want a unique object per-request, not shared globally, which means it should be able to register the hooks once but generate context objects out of those hooks many times. I've also sometimes seen the layered object approach where each async branch gets a new object with its prototype set to that of the parent tick--while that is a powerful design, it's also much more expensive and can lead to undesired scenarios like orphaned branches when activity is not directly in the async call path which leads to the request end. In my experience, the best approach overall in terms of performance and reliability is to be able to simply have an always-running background hook and the ability to at any arbitrary time start a context object which is passed down the async tree as-is.

puzpuzpuz · 2019-12-23T07:29:31Z

@Qard
Thanks for the clarification. I'll introduce a global hook for AsyncLocals later today.

Update. Done that.

vdeturckheim · 2019-12-23T16:07:23Z

Thanks @Qard for stepping into this!

@Flarna , I was not talking about implementation but about external API. the original CLS implementation exposes set(key, value) and get(key) that can be called on the result of getNamespace. That is what I had in mind.

Regarding the big picture, I have discussed with multiple TSC members and collaborators during the collaborator summit in Montréal. Everyone seemed to agree with the meed of a high level API to provide such features in core. It also seems that it has been agreed upon during a diagnostic summit (nodejs/node#26540 (comment) and nodejs/node#26540 (comment)).

At this point, I am afraid that we (and I am very responsible for it) end up bikeshedding too much in this current discussion. PR reviews should happen in PRs for better tracability.
Also, at the end of the day, I don't believe it really matters which PR will be merged: this will be an experimental feature, it can be updated or boken-changed a couple time over the next iterations based on feedback from real life.

Flarna · 2020-01-04T20:17:13Z

@vdeturckheim Yes, this was discussed on diag summit last march - I participated there 😁. This was the main reason why I tried and reviewed your AsyncStorage proposal and additionally created a different approach in the hope to get a bigger community to further review and discuss.

Main issue with async hooks is that they are still experimental. On diag summit we assumed that this avoids that user space modules start to use AsyncResource which would be needed that CLS works as one would expect in case of e.g. pooling db connections, queueing,... Usually no one likes to depend on experimental features which may change at any time...

Getting AsyncHooks API out of experimental is hard for several reasons (they expose internals, easy to use wrong,....). The idea was to add a CLS API in core which doesn't expose internals and therefore is easier to get out of experimental and fits most use cases covered currently by async hooks. Async hooks could be used internal and stay experimental to cover the remaining use cases.

So yes, adding something is hopefully enough to reach the target that AsyncResource spreads - as soon as this is out of experimental. It was never my intention to somehow block your PR to add AsyncStorage.

Meanwhile my motivation to further work on CLS in core is not that high anymore. Partly because most of my input was ignored anyway 😢 but besides that I have the feeling there is no unique understanding on what a CLS should actually do.
The current focus seems to be on application programmers viewpoint but my domain is APM which is different so I can't help that much. As you can see in the discussion in the various PRs regarding this it looks like the requirements are even conflicting.

Besides adding CLS to core there were some more points regarding async hooks discussed on diag summit like use of resource instead id. Idea was to get rid of destroy hook - at least for the "main" use case CLS. The destroy hook is quite bad regarding performance in special if await is used a lot. Besides that it helps to avoid leaks. In case the PR from @Qard gets merged I have some follow-up tasks in mind to do there.

naugtur · 2020-02-28T12:06:25Z

Now with all the mentioned PRs closed, where does this continue?

mhdawson · 2020-03-02T23:02:29Z

I think once we are comfortable we are not going to change the API in the short term we should open a new issue to collect input on the API and then do a blitz to promote/ask for feedback on the API pointing to that issue. Make sense?

wentout · 2020-03-03T06:39:37Z

I think once we are comfortable we are not going to change the API in the short term we should open a new issue to collect input on the API and then do a blitz to promote/ask for feedback on the API pointing to that issue. Make sense?

Might be then, when CLS will be merged, there must be some relatied description, easily viewable from the Async Hooks part of Documentation... Then, when developer will find an issue, they would be informed where to ask and how to ask properly? (@vdeturckheim) (@puzpuzpuz) what do you think?

puzpuzpuz · 2020-03-03T08:41:19Z

@wentout

Might be then, when CLS will be merged, there must be some relatied description, easily viewable from the Async Hooks part of Documentation...

Documentation already includes a section for AsyncLocalStorage: https://github.com/nodejs/node/blob/master/doc/api/async_hooks.md#class-asynclocalstorage

I think, this should be enough to point users in the right direction. Do you have any ideas on possible enhancements on your mind?

wentout · 2020-03-03T09:27:45Z

@puzpuzpuz

Do you have any ideas on possible enhancements on your mind?

Sorry for misexplanation. I suppose to add note about place where exactly users can post an issue. For example, there could be quote for users:

If you will find an issue in this api, please open an issue for diagnostics group. You can find more related information here: #345

mhdawson · 2020-03-04T22:36:37Z

@wentout I think it might be that we would use:

"If you will find an issue in this api, please comment on issue #XYZ in the diagnostic repo. You can find more related information here: #345"

This is basically your suggestion, just that instead of new issues we ask for people to comment on the issue that we've opened to capture the input in advance.

github-actions · 2020-07-16T00:36:42Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

puzpuzpuz mentioned this issue Dec 20, 2019

async-hooks: introduce async-storage API nodejs/node#26540

Closed

4 tasks

puzpuzpuz mentioned this issue Dec 23, 2019

async_hooks: add AsyncLocal API mk2 nodejs/node#31016

Closed

4 tasks

github-actions bot added the stale label Jul 16, 2020

github-actions bot closed this as completed Jul 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[async hooks] proposal for standard CLS API - request for feedback #345

[async hooks] proposal for standard CLS API - request for feedback #345

puzpuzpuz commented Dec 18, 2019 •

edited

Loading

Qard commented Dec 19, 2019

puzpuzpuz commented Dec 19, 2019 •

edited

Loading

Flarna commented Dec 19, 2019

puzpuzpuz commented Dec 19, 2019 •

edited

Loading

Flarna commented Dec 19, 2019

puzpuzpuz commented Dec 19, 2019 •

edited

Loading

puzpuzpuz commented Dec 19, 2019 •

edited

Loading

vdeturckheim commented Dec 20, 2019 •

edited

Loading

puzpuzpuz commented Dec 20, 2019 •

edited

Loading

vdeturckheim commented Dec 20, 2019 •

edited

Loading

puzpuzpuz commented Dec 20, 2019 •

edited

Loading

vdeturckheim commented Dec 20, 2019

puzpuzpuz commented Dec 20, 2019

Flarna commented Dec 20, 2019

puzpuzpuz commented Dec 21, 2019 •

edited

Loading

Qard commented Dec 22, 2019

puzpuzpuz commented Dec 22, 2019

Qard commented Dec 23, 2019

puzpuzpuz commented Dec 23, 2019

Qard commented Dec 23, 2019 •

edited

Loading

puzpuzpuz commented Dec 23, 2019 •

edited

Loading

vdeturckheim commented Dec 23, 2019

Flarna commented Jan 4, 2020

naugtur commented Feb 28, 2020

mhdawson commented Mar 2, 2020

wentout commented Mar 3, 2020

puzpuzpuz commented Mar 3, 2020

wentout commented Mar 3, 2020 •

edited

Loading

mhdawson commented Mar 4, 2020

github-actions bot commented Jul 16, 2020

[async hooks] proposal for standard CLS API - request for feedback #345

[async hooks] proposal for standard CLS API - request for feedback #345

Comments

puzpuzpuz commented Dec 18, 2019 • edited Loading

Qard commented Dec 19, 2019

puzpuzpuz commented Dec 19, 2019 • edited Loading

Flarna commented Dec 19, 2019

puzpuzpuz commented Dec 19, 2019 • edited Loading

Flarna commented Dec 19, 2019

puzpuzpuz commented Dec 19, 2019 • edited Loading

puzpuzpuz commented Dec 19, 2019 • edited Loading

vdeturckheim commented Dec 20, 2019 • edited Loading

puzpuzpuz commented Dec 20, 2019 • edited Loading

vdeturckheim commented Dec 20, 2019 • edited Loading

puzpuzpuz commented Dec 20, 2019 • edited Loading

vdeturckheim commented Dec 20, 2019

puzpuzpuz commented Dec 20, 2019

Flarna commented Dec 20, 2019

puzpuzpuz commented Dec 21, 2019 • edited Loading

Qard commented Dec 22, 2019

puzpuzpuz commented Dec 22, 2019

Qard commented Dec 23, 2019

puzpuzpuz commented Dec 23, 2019

Qard commented Dec 23, 2019 • edited Loading

puzpuzpuz commented Dec 23, 2019 • edited Loading

vdeturckheim commented Dec 23, 2019

Flarna commented Jan 4, 2020

naugtur commented Feb 28, 2020

mhdawson commented Mar 2, 2020

wentout commented Mar 3, 2020

puzpuzpuz commented Mar 3, 2020

wentout commented Mar 3, 2020 • edited Loading

mhdawson commented Mar 4, 2020

github-actions bot commented Jul 16, 2020

puzpuzpuz commented Dec 18, 2019 •

edited

Loading

puzpuzpuz commented Dec 19, 2019 •

edited

Loading

puzpuzpuz commented Dec 19, 2019 •

edited

Loading

puzpuzpuz commented Dec 19, 2019 •

edited

Loading

puzpuzpuz commented Dec 19, 2019 •

edited

Loading

vdeturckheim commented Dec 20, 2019 •

edited

Loading

puzpuzpuz commented Dec 20, 2019 •

edited

Loading

vdeturckheim commented Dec 20, 2019 •

edited

Loading

puzpuzpuz commented Dec 20, 2019 •

edited

Loading

puzpuzpuz commented Dec 21, 2019 •

edited

Loading

Qard commented Dec 23, 2019 •

edited

Loading

puzpuzpuz commented Dec 23, 2019 •

edited

Loading

wentout commented Mar 3, 2020 •

edited

Loading