-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: ability to dynamically reconfigure nsqlookupd addresses #601
Conversation
Also, this comes after a lengthy Twitter conversation with @mreiferson: https://twitter.com/paddyforan/status/619679434992447488 My initial investigations made this look like a lot of work and an involved feature. Matt says it's easier than I believed, which I will take his word for, because he probably has a better grasp on the codebase than I do. |
This would address the same use case as the suggestion above (ability to dynamically reconfigure a running
This can be tuned by changing the interval between
Yep. Lastly, none of the above actually talk about your DNS-based discovery suggestion, which is intriguing. It seems like a reasonably encapsulated change for |
This is, actually, my preferred implementation, as it is the most versatile when it comes to containers, and is the least-likely to require weird hacks to make it work. I also thought it'd be the one that would be hardest to implement, so I'm glad I'm super wrong. K8s has the concept of "sidecar containers". Basically, a k8s thinks in "pods", and a "pod" is a group of containers that all get deployed together, and k8s makes sure run together. I'd basically define an nsqd pod that consists of an nsqd container, and a "nsqlookupd-updater" container that's just a tiny little Go program that polls the k8s DNS, gets the lookupd addresses, and turns that into an nsqd update through the API we define. So one container just runs nsqd, without knowing anything about kubernetes, and the other container takes care of turning the kubernetes-specific information into updates to the nsqd config. They get deployed together, and kubernetes makes sure they're both running, and we neatly divide the responsibilities.
My concern is that this is semi-new and there seems to be some missing consensus around how to do this. Also, there are some minor, annoying variations between implementations. E.g., right now kubernetes uses A records while other things use SRV records, though they plan on updating later. By just providing the tool that says "update nsqlookupd addresses", we don't end up in a position where people want nsqd to know how to update based on:
I feel like providing the tools is a better approach than providing the solution in this case, just because there are a lot of possible configurations for the solution, and the amount of software it takes to wrap the tools to work with these configurations is pretty trivial. At the very least, I'd say provide the tools in nsqd itself, and then provide a few implementations as separate apps, much like nsq_to_file and nsq_to_http are implemented. |
Yes, I agree, and would align best with how everything has been designed and built so far. To play devil's advocate, this isn't necessarily the simplest approach from the user's perspective, though. We're talking about needing to bundle another app to sit between Curious if any other lurkers have any thoughts (cc @jehiah). Thanks for taking the time to chat through this @paddyforan! |
Two things about the devil's advocate position:
But I dunno. Even if we get to a point where one is supported, I can always just use a convoluted wrapper or a light fork to make it fit my use case (which is what I was investigating before I opened this issue). |
Cool, let's prototype what an HTTP API might look like for this. My initial thought is that it should be more general than just
|
I dig a HTTP config query/update API endpoint (as mentioned before it clarifies what values are dynamic vs a HUP reloading of the config file). That seems to be the base needed to make it possible to tackle any sort of new discovery mechanism. I like the general idea/usability of discovery via DNS. Handling multiple DNS A/AAAA records seems like a straightforward general way of providing some externally managed cluster state (ie: it feels like something i would begin to use); I feel i need to read up on rfc6763 and other common implementations to have a better opinion there. |
My only complaint is the POST. I'd, personally, prefer PATCH. But I'm nit-picking here. I'd be ecstatic to get this, even with POST. |
@@ -596,3 +600,41 @@ func (s *httpServer) printStats(stats []TopicStats, health string, startTime tim | |||
} | |||
return buf.Bytes() | |||
} | |||
|
|||
type allowedOpts struct { | |||
NSQLookupdTCPAddresses []string `json:"nsqlookupd_tcp_addresses"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be nice to support these as atomic add/remove or overwrite. The way I've done this in the past, which isn't too onerous, is
type allowedOpts struct {
NSQLookupdTCPAddresses []string `json:"nsqlookupd_tcp_addresses"`
NSQLookupdTCPAddressesAdd []string `json:"nsqlookupd_tcp_addresses_add"`
NSQLookupdTCPAddressesRemove []string `json:"nsqlookupd_tcp_addresses_remove"`
}
That way I have the option of overwriting the values completely, or I can just add a new address and remove a different one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why that would be valuable but the API feels wrong. The requirements really argue for specific nsqlookupd
config endpoints, which I was hoping to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, where I'm using this elsewhere, I just continue adding pointers to allowedOpts. The only difference is nsqlookupdtcpaddresses gets 3 properties, instead of 1, and a use specifies at most 2 in any request.
However, I'm nitpicking. This is a Useful Thing because it avoids weird race conditions and provides atomicity, which is Always Useful once you start distributing things. (Look at me, explaining distributing things to you. ) In practice, I feel like it would be an edge case where this would be an issue. If you feel it's not worth the API bloat, that's totally cool. I thought I'd bring it up while I had the chance :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm happy to discuss alternative APIs because now is the time to do it. It's not that I disagree with the benefits, it's that I think we should try to come up with an even better API if we want those semantics.
What about HTTP verbs? What about individual endpoints? If being more specific helps improve the API then I'm all for it...
Implementation wise, the HTTP side of this PR is the trivial stuff, most of the noise here is cleanup and proper handling of (now dynamic) state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verbs:
- PATCH uses the add/remove properties
- PUT uses the replace properties
- GET returns, as it does now
- DELETE is unused, because that would be silly
I'm :-/ about the semantics because it's not really REST and is more RPC in spirit. However, the REST way to do it would be to do
POST /config/nsqlookupdtcpaddresses/
to add an addressPUT /config/nsqlookupdtcpaddresses/
to replace the entire listDELETE /config/nsqlookupdtcpaddresses/{address}
to remove an address
But that seems excessive, and as I'm thinking about it, I'm probably overcomplicating matters. Let's step back for a second:
Can we think of a single service discovery mechanism that would give us an atomic view of the world? If we know about a change, we're almost certainly going to want to update the state of the world (in DNS and etcd, at the very least).
As long as nsqd is intelligent about diffing updates (e.g., if it's connected to A & B, and you update the addresses to be A, B, and C, it just connects to C. it doesn't disconnect from A & B) then I'm having trouble thinking of a scenario where I'd actually use this.
Sorry for the knee jerk here, I may be off base about whether this is actually useful.
TL;DR: If we can be declarative instead of imperative, that's probably better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a discussion offline, what if we just implemented PUT /config/<option>
for now (the body is the JSON encoded value)? This leaves room for some of these future decisions.
The <option>
would match the equivalent key in the config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would fit my use case. 👍
3775cde
to
8d804c0
Compare
RFR @jehiah @paddyforan I did a ton of cleanup and the overall diff is noisy. The individual commits are useful... |
opts := *s.ctx.nsqd.getOpts() | ||
switch opt { | ||
case "nsqlookupd_tcp_addresses": | ||
var addrs []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this would be cleaner to Unmarshal directly to the expected type inside this switch (ie: all the type checks go away)
var addrs []string
if err := json.Unmarshal(body, &addrs); err != nil {
return nil, http_api.Err{400, "INVALID_VALUE"}
}
opts.NSQLookupdTCPAddresses = addrs
any reason not to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i think we can Unmarshal directly to json.Unmarshal(body, &opts.NSQLookupdTCPAddresses)
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah, that is much simpler 👍
LGTM aside from comments. |
Looks super great to me. |
@@ -129,6 +140,13 @@ func (n *NSQD) lookupLoop() { | |||
break | |||
} | |||
} | |||
case <-n.optsNotificationChan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be reading this wrong, but it looks like on every config change, the lookupds will be disconnected. So if I want to change the verbosity, that triggers a lookupd reconnection?
Would it be possible to diff the current lookupPeers and the list in opts to only add/remove the peers necessary? That would at least keep us from triggering a total reconnect each and every time any config value changes, which seems like (if this use case gets expanded upon) something that may happen more and more frequently over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, good catch, I'll fix
PTAL @jehiah @paddyforan |
My only concern was addressed 👍 |
🔨 ? |
795dafe
to
f387012
Compare
ready |
nsqd: ability to dynamically reconfigure nsqlookupd addresses
The only problem I see with having an API only update approach and not supporting DNS is that you still have to have an outside force reconfigure each nsqd instance vs nsqd being able to reconfigure itself for multiple lookupd instances based on DNS. Just my two cents. |
I'd love it if NSQ could support DNS-based discovery for nsqlookupd instances. This would be a light layer on top of the already-proven architecture that would allow ops people to abstract away some of the configuration required to get NSQ working.
Basically, rather than hardcoding nsqlookupd addresses, I'd much rather just say "use all the A records returned by this DNS name". There's also an RFC for using DNS for service discovery. This supports things like Prometheus and makes deployment a snap on Kubernetes. It basically makes it dead simple to use NSQ in a cluster environment, where you don't know what IP address or machine any of your services will be available at, but know you can reach them through a DNS lookup.
More docs on the relevant portion of Kubernetes.
Here's the desired user experience:
Most of this (the DNS querying part) can be achieved simply with wrappers/helpers. The big missing piece is some sort of runtime live-reloading/editing of configuration. Any of the following would work beautifully:
Or I'm sure there are myriad better ways to accomplish this. Basically, the important bit is "here are the new nsqlookupd addresses, use them instead, but don't stop running/serving requests, and please don't catch on fire".
I'm making the following assumptions when imagining how this would be run: