-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove explicit "init" step to start a cluster #4027
Conversation
fe042a7
to
c22ac62
Compare
I didn't give this a completely thorough review, just enough to understand how it was working. Looks good. Review status: 0 of 24 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. cli/flags.go, line 78 [r1] (raw file): cli/start.go, line 92 [r1] (raw file): server/context.go, line 245 [r1] (raw file): server/node.go, line 270 [r1] (raw file): Comments from the review on Reviewable.io |
Reviewed 24 of 24 files at r1. gossip/gossip.go, line 403 [r1] (raw file): Comments from the review on Reviewable.io |
This seems like it's going to require different run scripts for bootstrapped and non-bootstrapped nodes, just like the current state. What's the benefit? Reviewed 24 of 24 files at r1. main.go, line 38 [r1] (raw file): server/node_test.go, line 133 [r1] (raw file): server/server.go, line 208 [r1] (raw file): server/server.go, line 226 [r1] (raw file): Comments from the review on Reviewable.io |
I worry that this optimizes for ease of use for toy deployments but makes things worse for non-toy ones. Deployments that want to use the With gossip persistence, the fancier resolvers are less significant, so maybe many deployments can do without them. But even then, things are tricky. If I think that getting rid of the init command is a good idea only if we can come up with a solution that has the following properties:
My suggestion is to add a (This would've been a good candidate for an RFC, BTW, and it may be worth doing that before coding up another iteration of the idea) Review status: all files reviewed at latest revision, 15 unresolved discussions, some commit checks failed. acceptance/cluster/localcluster.go, line 246 [r1] (raw file): gossip/gossip.go, line 273 [r1] (raw file): gossip/gossip.go, line 466 [r1] (raw file): gossip/resolver/node_lookup.go, line 108 [r1] (raw file): server/context.go, line 197 [r1] (raw file): server/context.go, line 245 [r1] (raw file): server/node.go, line 270 [r1] (raw file): storage/stores.go, line 48 [r1] (raw file): Comments from the review on Reviewable.io |
Lot of points here.
Review status: all files reviewed at latest revision, 15 unresolved discussions, some commit checks failed. cli/flags.go, line 78 [r1] (raw file): cli/start.go, line 92 [r1] (raw file): gossip/gossip.go, line 466 [r1] (raw file): gossip/resolver/node_lookup.go, line 108 [r1] (raw file): main.go, line 38 [r1] (raw file): server/context.go, line 197 [r1] (raw file): server/node.go, line 270 [r1] (raw file): storage/stores.go, line 48 [r1] (raw file):
In the old model, in that last step, it would just try to read the existing data and then update that to the newly added stores but the newly added stores have no data to read. Comments from the review on Reviewable.io |
c22ac62
to
fccf43d
Compare
Fixes #3909 |
fccf43d
to
fabb223
Compare
fabb223
to
d90d638
Compare
Reviewed 8 of 8 files at r2, 2 of 2 files at r3. server/node.go, line 270 [r1] (raw file): Comments from the review on Reviewable.io |
Review status: all files reviewed at latest revision, 11 unresolved discussions, some commit checks failed. server/node.go, line 270 [r1] (raw file): server/node.go, line 265 [r3] (raw file): Comments from the review on Reviewable.io |
First impressions and toy deployments are definitely very important; I'm just worried that this tips the scales too far towards first impressions in a way that will make things more difficult later on. You may be able to set environment variables on a per-node basis when creating EC2 nodes by hand, but if you're using any sort of automation to set up your instances it may not be as easy. So while the env var may help in some cases, it's not always applicable. In your response to my points you say that "all nodes CAN be started with the same command line flags", are you assuming the use of per-node environment variables? Because I'm not seeing another way to do that. On point 3, I do insist that the system be reasonably robust at avoiding accidental re-bootstraps. I think I agree that More alternatives:
Review status: all files reviewed at latest revision, 11 unresolved discussions, some commit checks failed. Comments from the review on Reviewable.io |
For reference, here's the process for initializing a Cassandra cluster: I won't hold up Cassandra as being the ideal to pursue, but certainly you probably want to be at least as easy if not a lot easier than them. I would hold them up as being the king of the hill to beat in the domain of fully-distributed horizontal scaleout databases (LSM with sstables and all of that, the Big Table heritage.) They do use gossip coupled with designated seed nodes for bootstrapping. |
I haven't given this a close look, but I tend to agree with @bdarnell's objections. On the other hand, I like the I wouldn't add the |
A quick google search yields plenty of examples of people setting environment variables programmatically via automated AWS tools. Yes, I'm assuming per-node env variables. Or else you could set up one node without --join and then stop it, then start them all with --join set to DNS round robin or @bdarnell, I'm not sure what you mean by In terms of the TLS certificates, I agree broadly with your point, but it argues that getting a "real" production cluster deployed should not necessarily focus on insisting that each node be run exactly once with identical flags. I think it's OK to have some one time setup steps when bootstrapping a cluster. What I'm really optimizing for with this change is making first time manual installs more approachable. The second alternative seems considerably more complicated than what we have now and what this change provides. @JackKrupansky, that Cassandra setup is surprisingly complicated, so we're looking good by contrast. @tschottdorf, the Review status: all files reviewed at latest revision, 11 unresolved discussions, some commit checks failed. server/node.go, line 270 [r1] (raw file): @petermattis it really felt like this needs to be very clearly communicated in the logs and it usually just gets lost in the log noise. Comments from the review on Reviewable.io |
d90d638
to
274f865
Compare
I was referring to @bdarnell's comment about per-node variables. Agree though that there will be ways around it that are simple to handle in automated deploys (where a command more or less doesn't really matter). Cassandra is indeed very intransparent with respect to bootstrapping (this is obvious from the docs, but also from my ops experience I remember a bunch of confusion around it), so I wouldn't take it as a benchmark here. Looking at RethinkDB it works exactly as in this suggested change. |
I called Yes, |
Just like I've removed the environment variable code pending actual use case, I'll wait on the @bdarnell Is this an LGTM? |
274f865
to
5f94d5a
Compare
I think the need for Review status: 10 of 24 files reviewed at latest revision, 9 unresolved discussions. cli/start.go, line 90 [r5] (raw file): gossip/resolver/node_lookup.go, line 108 [r1] (raw file): server/context.go, line 246 [r5] (raw file): Comments from the review on Reviewable.io |
bootstrap. Each item in the list has an optional type: | ||
"join": wrapText(` | ||
A comma-separated list of addresses to use when a new node is joining | ||
an existing cluster. Each address in the list has an optional type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to be more explicit about leaving it empty for the very first node. It's implied here, but emphasis wouldn't hurt.
LGTM. I started converting my tools and scripts. I'm hoping to have it ready to merge by the time you're done, but even then, a broken nightly run or two won't hurt. |
Review status: 10 of 24 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed. gossip/resolver/node_lookup.go, line 108 [r1] (raw file): server/context.go, line 246 [r5] (raw file): Comments from the review on Reviewable.io |
5f94d5a
to
dac13d8
Compare
Previously, you had to create a cluster by init'ing and starting first node: ./cockroach init --stores=... ./cockroach start --stores=... --gossip=self= and add new nodes with: ./cockroach start --stores=... --gossip=<node[,node,...]> Now, a cluster is created just by running start: ./cockroach start --stores=... and add new nodes with: ./cockroach start --stores=... --join=<node[,node,...]> Also cleaned up the gossip address storage code. It had gotten out of whack somehow and wasn't storing addresses in the event that a node had no bootstrapped stores on start.
dac13d8
to
783d21b
Compare
…step Remove explicit "init" step to start a cluster
Previously, you had to create a cluster by init'ing and starting first node:
./cockroach init --stores=...
./cockroach start --stores=... --gossip=self=
and add new nodes with:
./cockroach start --stores=... --gossip=<node[,node,...]>
Now, a cluster is created just by running start:
./cockroach start --stores=...
and add new nodes with:
./cockroach start --stores=... --join=<node[,node,...]>
Also cleaned up the gossip address storage code. It had gotten out of
whack somehow and wasn't storing addresses in the event that a node
had no bootstrapped stores on start.