Allow JetStream Publish retries iff ErrNoResponders was returned. #930

derekcollison · 2022-03-16T21:17:29Z

This is to avoid small blips from leader changes surfacing to the end application.

Signed-off-by: Derek Collison derek@nats.io

coveralls · 2022-03-16T21:22:18Z

Coverage decreased (-0.2%) to 85.097% when pulling 2a5ee5f on js_pub_retry_2 into c92df80 on main.

kozlovic

This test had 2 failures out of the 4 runs, so may want to see if that can be improved.

test/js_test.go

kozlovic · 2022-03-17T00:08:11Z

js.go

+			// To protect against small blips in leadership changes etc, if we get a no responders here retry.
+			time.Sleep(o.rwait)
+			if o.ttl > 0 {
+				resp, err = js.nc.RequestMsg(m, time.Duration(o.ttl))


Note that the call below uses the context that is created at the beginning, so it is likely that the deadline is the original one, whereas here you make each request with the original timeout, so maybe that should be reduced by the time already spent? If not, there will be some difference in behavior between context/timeout.

I think this might be fine, I like how the context controls the max duration of the retries logic. For example:

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel() _, err := js.Publish(subj, msg, nats.Context(ctx), nats.RetryWait(500*time.Millisecond), nats.RetryAttempts(30))

Above would mean to attempt to publish for 10 seconds max, although the retries would potentially take 15s but the context still sets the deadline. ~~Also means that possible to have infinite retries, so would retry as needed until the context expires~~ (edit: actually not possible to do infinite retries with current logic so made a suggestion)

I will do a rough estimation here subtracting out o.rwait and bailing if local ttl in the loop goes negative. Will post separate CL for ease of review but can squash before merging.

js.go

wallyqs · 2022-03-17T01:20:39Z

js.go

@@ -422,11 +433,23 @@ func (js *js) PublishMsg(m *Msg, opts ...PubOpt) (*PubAck, error) {
 	}

 	if err != nil {
-		if err == ErrNoResponders {
-			err = ErrNoStreamResponse
+		for r := 0; err == ErrNoResponders && r < o.rnum; r++ {


Suggested change

for r := 0; err == ErrNoResponders && r < o.rnum; r++ {

for r := 0; err == ErrNoResponders && (r < o.rnum || o.rnum < 0); r++ {

That change would make it possible to allow infinite retries, in order to let retries happen as needed while context is still active:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() _, err := js.Publish(subj, msg, nats.Context(ctx), nats.RetryWait(500*time.Millisecond), nats.RetryAttempts(-1))

Should we make sure a ctx is present to do that one?
I will make the change but we could get into an infinite loop if the stream is truly gone and their is no context provided to the Publish call.

Change made below.

kozlovic

LGTM

wallyqs

LGTM!

This is to avoid small blips from leader changes surfacing to the end application. Signed-off-by: Derek Collison <derek@nats.io>

derekcollison requested review from wallyqs, aricart and kozlovic March 16, 2022 21:17

kozlovic requested changes Mar 16, 2022

View reviewed changes

test/js_test.go Outdated Show resolved Hide resolved

test/js_test.go Show resolved Hide resolved

derekcollison force-pushed the js_pub_retry_2 branch from 26b58d2 to 15d1a42 Compare March 16, 2022 21:41

derekcollison requested a review from kozlovic March 16, 2022 23:47

kozlovic reviewed Mar 17, 2022

View reviewed changes

wallyqs reviewed Mar 17, 2022

View reviewed changes

js.go Outdated Show resolved Hide resolved

wallyqs reviewed Mar 17, 2022

View reviewed changes

derekcollison force-pushed the js_pub_retry_2 branch from bf7a7ad to b44711d Compare March 17, 2022 15:51

kozlovic approved these changes Mar 17, 2022

View reviewed changes

wallyqs approved these changes Mar 17, 2022

View reviewed changes

derekcollison force-pushed the js_pub_retry_2 branch from b44711d to 7ccdf20 Compare March 17, 2022 18:09

Allow JetStream Publish retries iff ErrNoResponders was returned.

2a5ee5f

This is to avoid small blips from leader changes surfacing to the end application. Signed-off-by: Derek Collison <derek@nats.io>

derekcollison force-pushed the js_pub_retry_2 branch from 7ccdf20 to 2a5ee5f Compare March 17, 2022 18:38

derekcollison merged commit 4fef66c into main Mar 17, 2022

wallyqs deleted the js_pub_retry_2 branch March 19, 2022 00:18

wallyqs mentioned this pull request Mar 19, 2022

Add notes on implementation of js.Publish retries nats-io/nats-architecture-and-design#105

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow JetStream Publish retries iff ErrNoResponders was returned. #930

Allow JetStream Publish retries iff ErrNoResponders was returned. #930

derekcollison commented Mar 16, 2022

coveralls commented Mar 16, 2022 •

edited

Loading

kozlovic left a comment

kozlovic Mar 17, 2022

wallyqs Mar 17, 2022 •

edited

Loading

derekcollison Mar 17, 2022

wallyqs Mar 17, 2022

derekcollison Mar 17, 2022

derekcollison Mar 17, 2022

kozlovic left a comment

wallyqs left a comment

	for r := 0; err == ErrNoResponders && r < o.rnum; r++ {
	for r := 0; err == ErrNoResponders && (r < o.rnum \|\| o.rnum < 0); r++ {

Allow JetStream Publish retries iff ErrNoResponders was returned. #930

Allow JetStream Publish retries iff ErrNoResponders was returned. #930

Conversation

derekcollison commented Mar 16, 2022

coveralls commented Mar 16, 2022 • edited Loading

kozlovic left a comment

Choose a reason for hiding this comment

kozlovic Mar 17, 2022

Choose a reason for hiding this comment

wallyqs Mar 17, 2022 • edited Loading

Choose a reason for hiding this comment

derekcollison Mar 17, 2022

Choose a reason for hiding this comment

wallyqs Mar 17, 2022

Choose a reason for hiding this comment

derekcollison Mar 17, 2022

Choose a reason for hiding this comment

derekcollison Mar 17, 2022

Choose a reason for hiding this comment

kozlovic left a comment

Choose a reason for hiding this comment

wallyqs left a comment

Choose a reason for hiding this comment

coveralls commented Mar 16, 2022 •

edited

Loading

wallyqs Mar 17, 2022 •

edited

Loading