Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add startup lock support (general framework + tested etcd implementation) #6

Merged
merged 25 commits into from
May 20, 2017

Conversation

Gsantomaggio
Copy link
Member

Moved the PR aweber#98

Alexey Lebedeff and others added 6 commits October 11, 2016 11:38
This required completely reworking startup sequence in `autocluster.erl`
without making the existing dozen step sequence into an incomprehensible
mess. Now steps are more independent and are executed in order with
function that provides something like State + Either monads.

Boot process is now split into 3 stages:
- First stage roughly corresponds to original implementation
- Second step is responsible for registering node in the backend. Also
  TTL timers are now started there, instead of spreading
  `-rabbit_boot_steps` through backend modules. This makes
  `autocluster.erl` the single source of truth about what is actually
  happening during startup.
- Third stage is responsible for releasing startup lock. We need it
  because of `ignore` failure mode - we want the lock to be released
  even if some prior steps failed.

During this rework it became evident that explicit mnesia:reset/0 is not
needed (more thorough explanation is in comments for
`autocluster:maybe_cluster/1`).

Startup locking support should work the same way for every backend, but
for now the only working implementation is for `etcd`. It even has some
property-based tests =)
Other backend fall back to original random delay behaviour.

Also it looks like `dialyzer` ignores type specs in comments - because
some of types there were definitely wrong, and converting to them to
`-spec` caused a lot of barfing from `dialyzer`.
…rabbitmq-autocluster_startup-locking-support
@Gsantomaggio Gsantomaggio force-pushed the rabbitmq-autocluster_startup-locking-support branch from 6093cd4 to 4909873 Compare March 27, 2017 10:06
Gabriele Santomaggio added 2 commits April 12, 2017 11:05
The plugin has to work even if the Backend is not configured.
Add the unit tests for these cases
@Gsantomaggio Gsantomaggio changed the title DO NOT MERGE Add startup lock support (general framework + tested etcd implementation) Add startup lock support (general framework + tested etcd implementation) Apr 12, 2017
6. Build a [Docker image](https://github.com/rabbitmq/rabbitmq-autocluster/blob/master/Dockerfile)
```
git clone https://github.com/rabbitmq/rabbitmq-autocluster.git rabbitmq-autocluster
make dist
Copy link
Member Author

@Gsantomaggio Gsantomaggio Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the users have to build the plugin from source.
We can remove:

git clone https://github.com/rabbitmq/rabbitmq-autocluster.git rabbitmq-autocluster
make dist

when we will put the docker image on docker hub

@michaelklishin
Copy link
Member

We will take a look after 0.7.0 GA is released.

@michaelklishin
Copy link
Member

@binarin @Gsantomaggio I don't see where in the code is augment_nodelist actually used beyond the test suite. Am I missing something or can be it removed? Alternative node representation records a pretty controversial thing for us to adopt into RabbitMQ master.

@michaelklishin
Copy link
Member

michaelklishin commented Apr 24, 2017

OK, so GitHub does not display the entire diff again. The "find best node to join" part can be a bit too opinionated for inclusion into RabbitMQ core.

@@ -371,19 +382,34 @@ startup_delay(Max) ->

-spec backend_register(#startup_state{}) -> ok | {error, iolist()}.
backend_register(#startup_state{backend_module = Mod}) ->
Mod:register().
case erlang:function_exported(Mod, register, 0) of

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply check if Mod is unconfigured ? All backends should support these callbacks, otherwise is a proper error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is unconfigured it raises the error undef


minikube service rabbitmq-deployment --namespace=test-rabbitmq

kubectl scale rabbitmq-deployment --replicas=4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails with:

$ kubectl scale rabbitmq-deployment --replicas=4
error: resource(s) were provided, but no name, label selector, or --all flag specified

This would work: kubectl scale --replicas=4 -f examples/k8s_minikube/rabbitmq.yaml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl scale deployment/rabbitmq-deployment --namespace=test-rabbitmq --replicas=6

Gabriele Santomaggio and others added 7 commits April 26, 2017 11:50
@michaelklishin michaelklishin added this to the 0.8.0 milestone May 20, 2017
@michaelklishin
Copy link
Member

Available as of 0.8.0.M1.

@binarin
Copy link

binarin commented May 22, 2017

Sadly This plugin will now pick a running reachable node that has the best uptime record from release notes is not true. While initially I though that this is a good idea, I wasn't able to come up with a solid (distributed) algorithm for this. So IIRC current implementation does the simplest thing that I can still reason about - it chooses first live node in alphabetical order.

@michaelklishin
Copy link
Member

@binarin thanks, I will edit out that line now.

@dumbbell dumbbell deleted the rabbitmq-autocluster_startup-locking-support branch January 2, 2018 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants