Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topic capacity #307

Closed
tj opened this issue Feb 10, 2014 · 9 comments
Closed

topic capacity #307

tj opened this issue Feb 10, 2014 · 9 comments
Labels

Comments

@tj
Copy link
Contributor

tj commented Feb 10, 2014

hey! just a general question since I'm not super familiar with the internals yet, would you say it would be bad practice to use hundreds of thousands of topics? I was planning on using one per-client, in the past we've just hashed but that obviously has visibility and fair-queue cons, but I can't see things scaling too well if we go with per-project topic mapping either haha. really enjoying nsq so far! thanks :D

@mediafinance
Copy link

+1 for the question.

Would love to hear an update answer on this question from the great minds
behind nsq,
as I posed a similar one late last year.

https://groups.google.com/forum/#!topic/nsq-users/uR_-vjQkTbs

@mreiferson
Copy link
Member

@mediafinance yea, I really need to get around to writing something longer form as this question comes up a lot. Another recent conversation on the mailing list, for example:

https://groups.google.com/forum/#!topic/nsq-users/AeoiVMt37eE

@visionmedia The answer is really context specific.

It sounds like you're trying to route messages on the publishing side to specific consumers (perhaps external clients?). We didn't explicitly design NSQ for this use case. In fact we did quite the opposite - we try really hard to decouple publishers from consumers.

This doesn't mean that you can't architect a system where things work harmoniously, though.

First, to answer your question, nsqd has no built in limits on # of topics/channels/clients. However, at some point you will reach the resource limits of your environment in terms of memory/cpu overhead of a topic/channel/client. I'd have to (and probably should) dig around to provide the exact # of bytes of memory overhead for each of those structs (and goroutines at runtime). We did some work in #236 to reduce CPU overhead when you hit 1000's of clients per nsqd - this ultimately had to do with the having many high-resolution timers by default. You can now configure your clients to disable output buffering to completely eliminate this overhead (and prioritize low latency).

But, one topic per client isn't the only way that this can be structured. As I had suggested in the link above, you can introduce multiplexing components that would be better suited to routing/targeting specific clients.

I whipped up a quick visual of what this topology might look like:

mux png 001

I think there are a lot of benefits to structuring the problem like this...

How does this match up with your requirements? Can you provide a little more information as to what you're building?

Hope this helps.

@tj
Copy link
Contributor Author

tj commented Feb 10, 2014

thanks for the reply! yeah we can totally service the jobs that way, that's similar to what we're doing now through rabbitmq - hashing to 100 different topics and pull from those in an attempt to handle busy producers fairer. We could always bump that up to 1000 or so to help mitigate but it would be killer if the supported it.

@mreiferson
Copy link
Member

The primary and most important difference between this proposal and your rabbitmq setup would be that this would not require any hashing.

The nsqd are "partitioned" naturally based on what published messages they receive. In this diagram they are co-located next to the producer (the # producers you deploy is equal to the number of nsqd responsible for that topic).

Also notice there's only one topic.

Each mux application connects on a separate channel for said topic, so each mux application gets the full stream (and can therefore service any client). It would discard any message routed to a client that it doesn't know about.

There is certainly a bit of redundant work being performed by mux, but the benefits are clear:

  1. you no longer have to hash manually to distribute load
  2. a client can connect to any mux (LB in front of mux?)
  3. run as many mux as you want redundancy in that tier
  4. you do not have to manage 1000s of topics

@tj
Copy link
Contributor Author

tj commented Feb 11, 2014

re-read your suggestion, sounds good to me, but wouldn't there still be a fairness issue? For example if producer A sent 15,000 messages at once, and then B / C sent 100 each, we'd be stuck processing A for a while if it's all in one topic (unless our maxInFlight chews through them fast enough)

@jehiah
Copy link
Member

jehiah commented Feb 11, 2014

jumping in on this thread.

NSQ message flow control is driven by the clients; This means that it's up to the client to distribute it's RDY amongst it's connections. In your example, if you had maxInFlight > 3 then you would have at least a concurrent RDY 1 from each client to nsqd on each of A, B and C and you'd work off the backlog evenly until finishing the 100 from B and C.

If however you had a single consumer and maxInFlight of 1, you'd get be connected to nsqd on A, B, and C but would request a message from one message source at a time [1]. In go-nsq, there is a timer that redistributes the RDY request to another server if it hasn't gotten a message in 5 seconds. It's expected that clients are generally distributed as well so if you had a larger number of clients (say 10 or 20) in total consuming the same topic, you'd expect that in aggregate they would pull evenly from A, B, and C even with maxInFlight of 1 such that they were individually consuming from a single source. (generally at bitly, we set maxInFlight to the range of expected source nsqd's unless it's a case where we expect individual message processing to be excessively long and we want to carefully avoid queued work for the consumer, or higher in cases where we have expected higher latency connections, or longer async client message processing)

[1]: since this is client dependent; i'm speaking mostly of go-nsq behavior. semantics might be slightly different amongst other client languages.

@tj
Copy link
Contributor Author

tj commented Feb 11, 2014

I should clarify some more, all the terms come off sounding like a client/producer node haha. What I really meant was a client as in one of our customers. They send us data (20,000+ of them), and we need a mechanism to mitigate the issue of one customer sending a ton of data at once, meanwhile the rest are stuck waiting in line while we process those.

I can see the mux technique working, but with our number of customers growing every day I imagine all those copies might become a decent burden on the system?

I was thinking maybe as an alternative (since we know before-hand) we could group all the small customers that don't produce much into one topic, and then the larger enterprise customers could each have their own, but it's hard to say how large that will get as well. Currently when one gets saturated in rabbit, all of the others that are hashed into that queue are left waiting, which is definitely not ideal, delivery time in our case definitely matters

@mreiferson
Copy link
Member

@visionmedia according to your description, it doesn't sound like you even need the mux approach that I had suggested. It's only relevant if you need messages to be routed to specific consumers and it seems like I misunderstood your initial question (you're partitioning for fairness).

I suspect some of your current issues aren't as relevant in the suggested NSQ setup (everything but the mux part) assuming you have "round-robin" style load balancing in front of your applications that receive these data payloads. A strict round-robin approach wouldn't treat any client differently (high or low volume) which would mean all nsqd co-located with those applications would be expected to get a reasonably even incoming volume. Effectively, this means you don't have the problem of one client being permanently sharded to a specific queue (in rabbit parlance) - said client would have no way of head-of-line blocking.

Now, this is all predicated on the fact that your consumer layer has sufficient resources to handle varying aggregate volume. At least, in this situation, either everything backs up or nothing, which I believe is desirable.

Specifically, I'm suggesting:

  1. a single topic (per data stream ala typical NSQ setup)
  2. nsqd co-located with applications receiving data payloads (this replaces your "sharding")
  3. round-robin or similar blind fairness load balancing in front of (2)
  4. no intermediate consumer (no mux)
  5. a sufficiently provisioned consumer layer

@tj
Copy link
Contributor Author

tj commented Feb 12, 2014

cool thanks for the feedback! I'll so some more testing and see how things behave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants