[BUG] Linkdef race condition during provider start #361

brooksmtownsend · 2022-03-03T19:50:36Z

Describe the bug

When a capability provider starts there is a period of time between the start of initialization and the time where the provider sets up its nats subscriptions. During this time, link definitions will be lost if they are published.

Background
Capability providers support subscribing link definitions when they are running and accepting a list of link definitions on the stdin stream that the host provides. Inbetween the time where the host opens the capability provider port, giving it a list of link definitions, and when the subscription is setup on the linkdef.put topic then the link definition will be lost until the provider is restarted.

Workaround
This scenario is a very small time window and primarily occurs during situations where starting actors, providers, and putting link definitions is scripted with some concurrency. The simple workaround for this is to publish link definitions before you start capability providers in your scripts in order to always deliver link definitions on provider startup.

Ideal solution
An ideal solution to this problem may be to have the capability provider ask the host for its link definitions rather than receive them on stdin. That way the linkdef.put subscription can already be configured, and the list of linkdefs will be accurate.

stevelr · 2022-03-03T23:08:40Z

for some background info, when a capability provider receives a put-link, it might do processing such as connect to a remote database, etc. Only after that operation complete successfully is the link added to the provider-maintained list of linked actors. If an RPC is received before that processing has completed, it will report an unlinked actor error.

stevelr · 2022-03-03T23:17:31Z

One potential fix would be to have providers return a pass or fail success code response to the linkdef put message. A host should not send a message to a provider unless it has received confirmation that the put link has completed successfully.

stevelr · 2022-03-03T23:22:59Z

The suggested workaround ..

publish link definitions before you start capability providers in your scripts in order to always deliver link definitions on provider startup

.. may not work because the provider still needs time to activate the links before it can receive rpc. All the workaround does is move the indeterminate link activation time from the time of put-link to the time of provider start. The race condition would then move to the time between sending a start command and sending rpcs.

brooksmtownsend · 2022-03-04T14:13:22Z

@stevelr I think we're talking about slightly different race conditions, but correct me if I'm wrong. The two that I see

(this issue) There is a non-zero amount of time between the provider executable starting its main process and when it sets up the NATS subscription to the linkdef.put topic. If a linkdef is put during this time, it's too late for the host to give it to the provider as a part of its HostData, and too early for the provider to receive the message on its own.
(the issue you mention) When a provider receives a linkdef.put, there is a non-zero amount of time between receiving that message and when the provider is truly ready to handle an invocation that corresponds with that linkdef. This can be exacerbated when the provider needs to establish connections, perform computations, etc

stevelr · 2022-03-04T15:13:04Z

Thanks for the clarification. I was referring to condition 2, not 1.

stale · 2023-05-17T17:45:04Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this has been closed too eagerly, please feel free to tag a maintainer so we can keep working on the issue. Thank you for contributing to wasmCloud!

brooksmtownsend · 2023-05-17T18:38:49Z

I'm pinning this issue because it's a significant bug that we're going to need to address and I don't want to lose it

brooksmtownsend added bug Something isn't working help wanted Extra attention is needed labels Mar 3, 2022

brooksmtownsend mentioned this issue Mar 9, 2022

update wasmbus-rpc, interface-sqldb, +example/unions wasmCloud/examples#130

Merged

brooksmtownsend mentioned this issue May 9, 2022

[BUG] <Issue> Wash ctl apply manifest need to applied twice to link wasmCloud/wash#209

Closed

brooksmtownsend mentioned this issue Oct 11, 2022

[RFC] Overhaul and Upgrade Link Definition Management wasmCloud/wasmCloud#266

Closed

stale bot added the stale label May 17, 2023

brooksmtownsend added the pinned Should not be removed as stale over time label May 17, 2023

stale bot removed the stale label May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Linkdef race condition during provider start #361

[BUG] Linkdef race condition during provider start #361

brooksmtownsend commented Mar 3, 2022

stevelr commented Mar 3, 2022

stevelr commented Mar 3, 2022

stevelr commented Mar 3, 2022

brooksmtownsend commented Mar 4, 2022

stevelr commented Mar 4, 2022

stale bot commented May 17, 2023

brooksmtownsend commented May 17, 2023

[BUG] Linkdef race condition during provider start #361

[BUG] Linkdef race condition during provider start #361

Comments

brooksmtownsend commented Mar 3, 2022

stevelr commented Mar 3, 2022

stevelr commented Mar 3, 2022

stevelr commented Mar 3, 2022

brooksmtownsend commented Mar 4, 2022

stevelr commented Mar 4, 2022

stale bot commented May 17, 2023

brooksmtownsend commented May 17, 2023