Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yet another review on hooks #7645

Closed
jsirex opened this issue Apr 27, 2020 · 6 comments
Closed

Yet another review on hooks #7645

jsirex opened this issue Apr 27, 2020 · 6 comments
Labels
Stale Type: Bug Issues that describe broken functionality

Comments

@jsirex
Copy link
Contributor

jsirex commented Apr 27, 2020

Yet another review on hooks

Because of new supervisor behavior here is what we have:

  1. If config/* change - reconfigure hook is triggered
  2. If hooks/* change - unconditional uncontrollable restart is triggered.

Possible way to workaround this is extract all moving parts in config/ and detect these changes (#7644) in reconfigure hook, then trigger restart if necessary.

I tried to gather all hooks and its current state in the following table:

Hook name If failed then If absent then
init No docs
run No docs No docs
post-run Retry post-run
reconfigure No docs restart
post-stop No docs
file-updated No docs
suitability No docs No docs
--------------------------- ---------------- ----------------
New hooks we likely need
--------------------------- ---------------- ----------------
stop - missing term-wait-kill term-wait-kill
upgrade - missing restart restart
downgrade - missing restart restart

Suitability

Ambiguous Docs leader down is not a leader service down, but the leader supervisor down

Stop

When service asked to stop use this hook instead of kill. Anything printed to stdout resets timeout counter.
This is so obvious but missing hook. I can control kill signal, timeouts, retries, etc.
Currently, for some services I have to set ridiculous huge timeout because service may finish their work in 5 seconds or in 5 minutes.

Upgrade / Downgrade

Note on upgrade / downgrade hooks because such topic is really complicated:
If an user wants to deal with upgrade/downgrade he is probably should thinking in a way how to migrate from current version to new / old one. Upgrade / Downgrade may skip some versions too.
That's why when upgrading we should try new files and hooks, but when downgrading current one. There is also should be a way to get new / old pkg info. Add template data like {{pkg}} for updated package {{update_pkg}}.

CURRENT_PKG_VERSION="{{pkg.version}}" # 2.2.3
NEW_PKG_VERSION="{{update_pkg.version}}" # 4.6.2

migrate_from_1_to_2() { ... }
migrate_from_2_to_3() { ... }
migrate_from_3_to_4() { ... }

for version in $(seq "$current_major" $(("$new_major" - 1))); do
    migrate_from_"$version"_"$((version + 1))"
done

Issues

The following issues are likely related:

@jsirex jsirex added the C-bug label Apr 27, 2020
@krasnow
Copy link

krasnow commented Apr 29, 2020

Thanks for the fantastic feedback. As you noted, there's a ton of open issues related to hooks. We're going to gather this up as an epic to do a holistic look at what hooks need to be added and what issues need to be fixed.

@jsirex
Copy link
Contributor Author

jsirex commented May 29, 2020

Actually I already blocked by missing these hooks. Any implementation will be nice.
I have really obvious use-case: my service (cluster) has update strategy (rolling or at-once does not matter).

Before upgrade I want prepare cluster for upgrade. For example:

  1. Elasticsearch - pause shards load balancing
  2. MongoDB - gracefully step back primary
  3. Patroni / Postgresql - switchover or pause cluster management
  4. etc

But, currently, I cannot control this. Such upgrade will just break all.
Possible workaround is to override shutdown behavior and always do "very-very" accurate shutdown, but shutdown hook is missed too.

@stevendanna
Copy link
Contributor

Also related: #5135

@christophermaier christophermaier added Type: Bug Issues that describe broken functionality and removed C-bug labels Jul 24, 2020
@danielcbright
Copy link

I ended up writing a habitat "wrapper" called cluster patcher that allows for some of the behavior you state you're blocked on currently, I'll clean it up and share it here if anyone is interested.

@stale
Copy link

stale bot commented Sep 13, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We value your input and contribution. Please leave a comment if this issue still affects you.

@stale stale bot added the Stale label Sep 13, 2021
@stale
Copy link

stale bot commented Oct 20, 2021

This issue has been automatically closed after being stale for 400 days. We still value your input and contribution. Please re-open the issue if desired and leave a comment with details.

@stale stale bot closed this as completed Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Type: Bug Issues that describe broken functionality
Projects
None yet
Development

No branches or pull requests

5 participants