Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid infinite reconnection retries #9

Closed
wants to merge 1 commit into from
Closed

Avoid infinite reconnection retries #9

wants to merge 1 commit into from

Conversation

purbon
Copy link

@purbon purbon commented Jun 23, 2015

Add a configuration option to set the number of times the output is going to try to reconnect in case of a failing connection. If reconnect_times if not defined then the former behaviour is expected,
going to retry connection forever.

Version 1.1.0 bump

going to try to reconnect in case of a failing connection. If
reconnect_times if not defined then the former behaviour is expected,
going to retry connection forever.

Version 1.1.0 bump
retry

@reconnect_tries += 1
retry if retry_connection?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this make it lose the event if redis is down?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it makes, but without modifying the pipeline logic there is actually no way I see to handle this. As this is an option feature, the user can decide if he want a finite or infinite number of retries.

@colinsurprenant
Copy link
Contributor

In this state I agree with @jordansissel that this proposed change is worrisome. First there is no explicit mention of the fact that the event has been dropped. Then, I am not sure which is the "lesser evil" between having a stalled pipeline and having all input events dropped in the logs if the reconnect_times option is set.

I think that we should think about this from these 2 angles:

  • the pipeline stall problem is not actually caused by stalled output but because of a poor shutdown signalling implementation (me) and the lack of persistence (in progress).
  • logstash outputs has been following this idea of continuous retries, for the best or the worst, and we should make sure we have a consistent strategy before changing this semantic.

@purbon
Copy link
Author

purbon commented Jul 7, 2015

Hi,
I agree with you too. This PR introduces what for me was the most
straightforward solution to this problem for now, but I agree the right
solution is to fix the pipeline shutdown semantics.

Is there any solution you might think, who provides help now to users
facing this situation?

  • purbon

On Mon, Jul 6, 2015 at 8:16 PM Colin Surprenant notifications@github.com
wrote:

In this state I agree with @jordansissel https://github.com/jordansissel
that this proposed change is worrisome. First there is no explicit mention
of the fact that the event has been dropped. Then, I am not sure which is
the "lesser evil" between having a stalled pipeline and having all input
events dropped in the logs if the reconnect_times option is set.

I think that we should think about this from these 2 angles:

  • the pipeline stall problem is not actually caused by stalled output
    but because of a poor shutdown signalling implementation (me) and the lack
    of persistence (in progress).
  • logstash outputs has been following this idea of continuous retries,
    for the best or the worst, and we should make sure we have a consistent
    strategy before changing this semantic.


Reply to this email directly or view it on GitHub
#9 (comment)
.

@colinsurprenant
Copy link
Contributor

@purbon as I said in my previous comment, my proposed solution(s) is a) to properly fix the shutdown signalling which should happen for 1.6 per elastic/logstash#3451 b) to move on with persistence and c) to discuss about "global" retry strategy for output plugins...

@purbon
Copy link
Author

purbon commented Jul 8, 2015

Yes, this makes sense.

  • purbon

On Tue, 7 Jul 2015 23:40 Colin Surprenant notifications@github.com wrote:

@pere https://github.com/pere as I said in my previous comment, my
proposed solution(s) is a) to properly fix the shutdown signalling which
should happen for 1.6 per elastic/logstash#3451
elastic/logstash#3451 b) to move on with
persistence and c) to discuss about "global" retry strategy for output
plugins...


Reply to this email directly or view it on GitHub
#9 (comment).<img
src="https://ci5.googleusercontent.com/proxy/hIaIBPVnn7VvNpCG

h

9i7YtnZEaGCnRNJ9FWdwnOuaMChHupNYdZjehlLM5HicyweXPa0ZbO7xeEWlJJuFqRrlsZkSapP71niAD1Bn8EgKW1X_BIf4eR0dW_zmqTbKrQWmCsOB4eeWlL2IWttZSF79swz7dqPw7g=s0-d-e1-ft#
https://github.com/notifications/beacon/AAELvKuCBSpphBzqhzdA58bj4UqX3b1qks5obD7bgaJpZM4FJ4Xv.gif
">

@purbon
Copy link
Author

purbon commented Jul 8, 2015

Closing this, as this is not the intended solution and a proper shutdown sequence is going to be implemented soon. We can always reopen if necessary.

@purbon purbon closed this Jul 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants