-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stub: support restart after stub stopped #91
Conversation
491312c
to
8af88db
Compare
My first question is that do we really want to initiate a reconnection from the stub itself ? Instead of letting some higher-level application logic decide if, and when, attempting a reconnection is both desirable and safe ? We have the |
I quite approve of the idea of leaving the reconnection work to the higher-level application to handle. However, in the actual attempt, I discovered the following problems that are difficult to solve by the higher-level application logic. Most of them come from the current internal logic of the stub:
To summarize the problems described above, most of the current connection handling logic is defined within the stub, and there are not enough exposed public methods to implement custom reconnection logic. If my understanding is incorrect, or if there are currently any proposals for optimizing the design of the plugin <-> runtime connection management, please let me know. I am willing to implement the code for these designs. But before that, if a convenient reconnection handling option can be provided to plugin developers for easy use, it might also be a good choice. |
Sorry, I was not specific enough. I did not mean to try reusing/reconnecting the already disconnected stub. I meant to decide/check whether and when re-establishing the connection is safe, then create a new stub and connect that one... |
oh i got you, In fact, currently we do achieve the purpose of reconnection and re-registration plugin by rebuilding the stub object. I think a rather crucial point here is what the long-term positioning of the |
Well, (be)for(e) answering that question there is a related question to consider: 'what is a plugin ?' or more precisely 'can the plugins lifecycle be different from the lifecycle of the connection of the plugin ?'. These might sound very philosophical at first, outright ridiculous even, before you start to consider things from the runtime's perspective. On the client side, sure, there is a clear difference between the lifecycles of the plugin and its connection. On the runtime side however, there isn't a clear distinction. Once a connection is closed/lost, the runtime considers the plugin gone forever. If the plugin was an internal one, we would need to restart it to get it connected, up, and running again. That creates a new connection, a new process, and a new struct for the plugin. Clearly a new instance from all aspects. If it was an external plugin, even if it does not exit but reconnects from within the same process, a new connection and a new internal struct is created in the runtime. Again, a new instance from the runtime's point of view. So from the runtime's perspective the lifecycles of the connection and the plugin are the same...
That is a fair and valid point. I'd still argue that we should make the stub restartable and reconnectable by the plugin, instead of trying to make the stub restart and reconnect itself. Considering that it is completely unpredictable in what context a 'connection lost' event comes and in what state the plugin itself is (as opposed to the stub) at that moment, only the application can know when it is safe to reconnect and, for instance, what kind of locking or other synchronization need to be taken care of before this can be attempted. So IMO, we should reset/clear enough of the internal state of the stub that it should be possible to safely restart/reconnect it. And maybe this should always be done explicitly... perhaps only by an explicit call to So then for a reconnectable plugin
|
Hi! Klihub, thank you very much for your explanation! After these discussion, I think I understand your concern, that is, it's okay to make the stub on the client side restartable, but it's unsafe to let it restart "automatically". At present, should I first modify the current patch to achieve “Stop() would cause the stub to reset enough of its internal state”, I think this is a good start. |
Yes, I also think that would be the best to start with. |
8af88db
to
2d6ea6f
Compare
I modified the implementation according to the conclusion of the previous communication. There are some key modifications:
Now, there are two ways in which the high level plugin logic uses the
But at present, I still have some things that I haven't thought through: During the execution of the |
@klihub can you give me more opinion, thanks. |
I don't think this is a good idea and it is not what we discussed earlier. The plugin should be considered running only when it is serving requests. Once it stops serving requests, it should not be considered running and |
Frankly speaking, this is also the aspect that I have spent a considerable amount of time on but still haven't figured out how to handle better. The positioning of Regarding the restart requirement I initially proposed, the goal can now be achieved by using |
d52bb52
to
e18aee5
Compare
@klihub How to optimize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaodiaoer This looks now good to me, apart from a bit unclear godoc comment about Run(). I gave this a brief practical test this with a modified template
plugin which internally Stop()
ped itself periodically and otherwise sat in a Run()
loop. It seemed to work fine, although I did not try to torture it much.
Could you still fix/update that unclear comment ?
sure, let me do that |
e18aee5
to
1e59c1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
kindly ping @fuweid |
@fuweid If you have some spare cycles, PTAL. |
1e59c1b
to
316daf5
Compare
@@ -401,6 +401,7 @@ func (stub *stub) Start(ctx context.Context) (retErr error) { | |||
|
|||
log.Infof(ctx, "Started plugin %s...", stub.Name()) | |||
|
|||
stub.started = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 398 is still working? We can fix it in the follow-up.
if err = <-stub.cfgErrC; err != nil {
return err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think before we need to let stub.Configure()
may been called more than once, and these code are must run after stub.register()
, we can let it remain as it is for the time being, the currently implementation of stub's restart is depend that stub.register()
is executed every time after restart .
c2ba51c
to
4e66aa2
Compare
In some scenarios such as runtime restart or the occurrence of nri request timeout, the ttrpc connections between the plugin and the runtime will be actively closed by the runtime, even the underlying network connection will be closed together. After this, the plugin must need to re-register to the adaptation side, but now the stub object cannot be reused for this; if the running plugin wants to reconnect to the runtime, the only way is to create a new stub for the plugin. This commit has split the lifecycle of the stub and the ttrpc connection to better support the development of the external type of plugins. The plugin developer can build stub once and use it to connect to adaptation side many times, just need re-call Start() function. Signed-off-by: Lei Liu <liulei.pt@bytedance.com>
4e66aa2
to
242661f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In some scenarios such as runtime restart or the occurrence of nri request timeout, the ttrpc connections between the plugin and the runtime will be actively closed by the runtime, even the underlying network connection will be closed together. After this, the plugin must need to re-register to the adaptation side, but now the stub object cannot be reused for this; if the running plugin wants to reconnect to the runtime, the only way is to create a new stub for the plugin.
This commit introduce a new stub option
WithAutoRestartDelay
. The plugin developer can use this to build a reuseable stub, which will wait for a confiurable time duration and restart automatically after the ttrpc connection is lost, then plugin will be auto-reregistered to the adaptation.