Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with the new ThreadSafeFunction api #524

Closed
ghost opened this issue Aug 15, 2019 · 14 comments
Closed

Issues with the new ThreadSafeFunction api #524

ghost opened this issue Aug 15, 2019 · 14 comments
Labels

Comments

@ghost
Copy link

ghost commented Aug 15, 2019

@romandev I am enthusiastically trying to use the new ThreadSafeFunction Api as a replacement for the non-standard napi-thread-safe-callback which I was using previously. I have a few problems though in my (typical?) use case where I have async native C++ functions that return quickly to JS but continue to do the actual work in an async C++ worker or thread, calling back into JS with progress information and/or result(s) using ThreadSafeFunction.

NB: Updated to show that issue 1 & 2 is now solved

1) SOLVED: Most importantly, the ThreadSafeCallback class is lacking a copy-constructor so I can not safely and easily capture it into c++ threads, lambdas and AsyncWorker's with pass by value. The provided example at the bottom of "https://github.com/nodejs/node-addon-api/blob/master/doc/threadsafe_function.md" solves this by having ThreadSafeCallback as a global mutable variable but this is dangerous (might break with repeated calls to the native function quickly after each other) and I also personally think global mutable variables is a bad thing. Instead the native method should initialize the ThreadSafeCallback object which is than copied into the thread/lambda/worker by value (before original object is destroyed as it goes out of scope when function returns etc).

2) FIXED BY ABOVE: In the absence of a copy-constructor, it is a problem that the ThreadSafeCallback class has only factory methods (for parameters) and lacks a constructor with the various parameters so new/delete could be used to control the life-cycle and hereby enable capturing of a pointer to ThreadSafeFunction in c++ threads, workers and lambdas (As a temporary fix, I have created by own simple wrapper class that holds a ThreadSafeFunction inside and has a constructor with all the arguments thus enabling the wrapper with the ThreadSafeFunction to be created on the heap).

3) It is not clear what initialThreadCount should be set to and how and when Acquire and Release should be called. The example sets initialThreadCount to 1 though used by two threads and does not call Acquire ? From my own experiments, it seems important to call Release exactly once to make sure the node process terminates - it does not seem to matter what Thread I call Release from in my example as long as I don't call it twice (in which case an Assert error with a nice stack trace will be shown).

4 There is no way for the C++ code to wait for the queue to be empty (all calls in queue to be processed). This would be useful to delay cleanup until a NonBlockingCall has been fully executed.

@mhdawson
Copy link
Member

@extmchristensen, @gabrielschulhof who it probably the best person to comment is away for another week so it might be after that when we engage in a discussion on this.

@almoghamdani
Copy link

Any news on when the copy constructor will be implemented?

@marco-ms
Copy link

marco-ms commented Sep 7, 2019

Having the same issue as 1), I tried to convert the lambda callback to a functor that was going to hold the ThreadSafeFunction, but for some reason I didn't manage to get it to compile.
I also posted another question on why the callback Function is a required parameter on any ThreadSafeFunction::New(), I use Promises and having to pass a Function is redundant to me. It could be optional or I could be able to choose between Function and Promise, no?

@mhdawson
Copy link
Member

mhdawson commented Sep 9, 2019

  1. Most importantly, the ThreadSafeCallback class is lacking a copy-constructor so I can not safely and easily capture it into c++ threads, lambdas and AsyncWorker's with pass by value. The provided example at the bottom of "https://github.com/nodejs/node-addon-api/blob/master/doc/threadsafe_function.md" solves this by having ThreadSafeCallback as a global mutable variable but this is dangerous (might break with repeated calls to the native function quickly after each other) and I also personally think global mutable variables is a bad thing. Instead the native method should initialize the ThreadSafeCallback object which is than copied into the thread/lambda/worker by value (before original object is destroyed as it goes out of scope when function returns etc).
  • std::unique_ptr<napi_threadsafe_function, Deleter> _tsfn; means we cannot just add a copy constructor.
  • Seems like we can just make it a regular pointer.
  • We should add acquire on all creates, and release on all releases (means acquire/release won't necessarily need to be called at all)
  • Creating a copy should increase tfsn ref count, requiring release for each copy.
  • Assignment should do release.

@KevinEady
Copy link
Contributor

@mhdawson / @gabrielschulhof ,

I was thinking about what we said regarding these points:

We should add acquire on all creates, and release on all releases (means acquire/release won't necessarily need to be called at all)
Creating a copy should increase tfsn ref count, requiring release for each copy.
Assignment should do release.

I don't feel like it is necessary to tie acquire & release to copy + destruction... What was the reasoning behind this? I think it had something to do with race conditions? Would these race conditions still exist if we were using a normal napi napi_threadsafe_function and not the wrapper? If so, I feel like it's something we do not need to specifically worry about in node-addon-api.

Likewise, with the concern of overwriting a ThreadSafeFunction with an existing one, a la:

ThreadSafeFunction tsfn;
tsfn = ThreadSafeFunction::New(...);
tsfn = ThreadSafeFunction::New(...);

... would end up leaking the first TSFN... I feel like we should not need to specifically handle this case... eg, what would happen in the regular napi case?

napi_threadsafe_function tsfn;
napi_create_threadsafe_function(..., &tsfn);
napi_create_threadsafe_function(..., &tsfn);

.. the first TSFN reference is lost, no?

Going back to @gabrielschulhof 's overall mantra of "the TSFN wrapper shouldn't have too much logic" ... using the ThreadSafeFunction correctly (eg don't try to copy it into another thread when the tsfn is no longer available) would be up to the developer, which in a lot of instances it already is anyway (eg. if you get a napi_closing from a BlockingCall , don't make any more calls)

One thing I see (which I think we brought up) is that the copied ThreadSafeFunctions will point to an invalid _tsfn after the Finalize is called ... but i'm not 100% sure this is actually a problem (going back to "using the ThreadSafeFunction correctly would be up to the developer")?

@marco-ms
Copy link

marco-ms commented Sep 12, 2019

@KevinEady I understand your point, but @extmchristensen 1. point is real, since I am using ThreadSafeFunction together with a Promise, if I chain two calls in the JS that will use the same tsfn like in your example here:
https://github.com/nodejs/node-addon-api/blob/master/doc/threadsafe_function.md#example

The main problem is my code calls Promise.Resolve() before tsfn.Release(); and in the JavaScript resolution function I call another add-on API which reassign tfsn and produces the following crash:
FATAL ERROR: ThreadSafeFunction::operator = You cannot assign a new TFSN because existing one is still alive.

So the example doesn't really work in real life code, unless I am missing something.

@KevinEady
Copy link
Contributor

@marco-ms Sorry, I'm just trying to understand your use-case... Why are you re-assigning tsfn?

@marco-ms
Copy link

marco-ms commented Sep 16, 2019

@marco-ms Sorry, I'm just trying to understand your use-case... Why are you re-assigning tsfn?
I have been following the example where a global tfsn is created and reused. However your example is pretty basic and just throws result every second for N times (N defined by JavaScript caller).

What I have been doing is exposing two functions: a and b.
Both a and b returns a promise with a result, they don't use the callback in the ThreadSafeFunction.
Both a and b uses global tsfn and redefine it, but if I chain a and b like this:

b().then(() => {
 // tsfn has called our callback which has resolved the promise
 a().then(() => {
    // it'll crash just calling a() since it'll reassign tsfn before b is done with it and call Release()
 });
)};

It'll crash since b calls blocking until JavaScript code is done, a's kicks in and will redefine tfsn and will crash with the error mentioned above.
So I tried to wrap ThreadSafeFunction like suggested by @extmchristensen here is the code:

#include "MyThreadSafeFunction.h"
    MyThreadSafeFunction::MyThreadSafeFunction(Napi::Env env) {
        // https://github.com/nodejs/node-addon-api/blob/master/doc/threadsafe_function.md
        // This API for thread safety requires N-API v4 which is supported only by the following Node:
        // v8.16.0, v11.8.0, v12.0.0 (or greater)
        // https://nodejs.org/api/n-api.html#n_api_n_api_version_matrix
        Napi::Function bogusCallback = Napi::Function::New(env, [](const Napi::CallbackInfo& cbinfo) { return cbinfo.Env().Undefined(); } );
        _tsfn = Napi::ThreadSafeFunction::New(
            env,
            bogusCallback,            // We use Promises, not Callbacks
            "Resource Name",          // Name
            0,                        // Unlimited queue
            2                         // Does it matter?
        );
    }

    MyThreadSafeFunction::~MyThreadSafeFunction() {
        _tsfn.Release();
    }

    napi_status MyThreadSafeFunction::BlockingCall(function<void(Napi::Env env, Napi::Function jsCallback)> callback) {
        return _tsfn.BlockingCall(callback);
    }

    napi_status MyThreadSafeFunction::NonBlockingCall(function<void(Napi::Env env, Napi::Function jsCallback)> callback) {
        return _tsfn.NonBlockingCall(callback);
    }

This way each time I call a method it'll have its own tsfn instance and will be released during dtor, however for some reason Electron is now crashing randomly with:

Electron(21589,0x1198705c0) malloc: Incorrect checksum for freed object 0x7fd68cb12460: probably modified after being freed.
Corrupt value: 0x0
Electron(21589,0x1198705c0) malloc: *** set a breakpoint in malloc_error_break to debug

And I haven't figured out why.

@KevinEady
Copy link
Contributor

Hi @gabrielschulhof ,

I'm working on the copy constructor, and I'm thinking back to something you said regarding the underlying napi_threadsafe_function, is that it itself acts as a "pointer" . If that is the case, could the ThreadSafeFunction class just have the concrete napi_threadsafe_function _tsfn instead of a pointer to one? Thereby, copying a ThreadSafeFunction would copy whole napi_threadsafe_function structure in the new object. Is this okay? How would it behave in the C world if we just copied napi_threadsafe_functions?

I'm concerned that if we're using pointers, we'll need to take special care of the memory, as ::New needs to create a new object (which eventually needs to be deleted), whereas the ThreadSafeFunction also has a constructor taking an existing napi_threadsafe_function, which shouldn't be deleted.

KevinEady added a commit to KevinEady/node-addon-api that referenced this issue Sep 18, 2019
Removes the `unique_ptr` from `ThreadSafeFunction`, thereby allowing
copies.

Ref: nodejs#524
mhdawson pushed a commit that referenced this issue Nov 1, 2019
* tsfn: Implement copy constructor

Refs: #524

PR-URL:  #546
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
@KevinEady
Copy link
Contributor

Is this issue still relevant? cc @mhdawson

@ghost
Copy link
Author

ghost commented Mar 24, 2020

Part 3 & 4 is still relevant - part1+2 solved

@KevinEady
Copy link
Contributor

  1. It is not clear what initialThreadCount should be set to

Take a look at this post. In all actuality, the thread count is just that: a counter. Even if you set it to 1, you could theoretically have 10 threads using it (but then you'd need to implement some lifecycle management yourself). As long as no call to Acquire() is made, this tsfn would destroy as soon as Release() (or Abort()) is called.

In the specific example, it is set to 1 because only one thread does use it: the thread started via std::thread. That thread is also the only one that calls Release(), which then decreases the count to 0, causing the finalization to begin.

  1. There is no way for the C++ code to wait for the queue to be empty

I feel like this can be solved programmatically by the developer...? If the last item you add to the queue performs some callback, you can perform some logic there. The underlying N-API does not have any functionality for this, so I don't think node-addon-api can provide this.

@github-actions
Copy link
Contributor

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Dec 17, 2020
@mhdawson
Copy link
Member

I'm going to close this since its been a long time since the last response. Please let us know if you think that was not the right thing to do.

kevindavies8 added a commit to kevindavies8/node-addon-api-Develop that referenced this issue Aug 24, 2022
* tsfn: Implement copy constructor

Refs: nodejs/node-addon-api#524

PR-URL:  nodejs/node-addon-api#546
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Marlyfleitas added a commit to Marlyfleitas/node-api-addon-Development that referenced this issue Aug 26, 2022
* tsfn: Implement copy constructor

Refs: nodejs/node-addon-api#524

PR-URL:  nodejs/node-addon-api#546
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
wroy7860 added a commit to wroy7860/addon-api-benchmark-node that referenced this issue Sep 19, 2022
* tsfn: Implement copy constructor

Refs: nodejs/node-addon-api#524

PR-URL:  nodejs/node-addon-api#546
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
johnfrench3 pushed a commit to johnfrench3/node-addon-api-git that referenced this issue Aug 11, 2023
* tsfn: Implement copy constructor

Refs: nodejs/node-addon-api#524

PR-URL:  nodejs/node-addon-api#546
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants