topic capacity #307

tj · 2014-02-10T18:08:35Z

hey! just a general question since I'm not super familiar with the internals yet, would you say it would be bad practice to use hundreds of thousands of topics? I was planning on using one per-client, in the past we've just hashed but that obviously has visibility and fair-queue cons, but I can't see things scaling too well if we go with per-project topic mapping either haha. really enjoying nsq so far! thanks :D

mediafinance · 2014-02-10T19:21:42Z

+1 for the question.

Would love to hear an update answer on this question from the great minds
behind nsq,
as I posed a similar one late last year.

https://groups.google.com/forum/#!topic/nsq-users/uR_-vjQkTbs

mreiferson · 2014-02-10T22:42:39Z

@mediafinance yea, I really need to get around to writing something longer form as this question comes up a lot. Another recent conversation on the mailing list, for example:

https://groups.google.com/forum/#!topic/nsq-users/AeoiVMt37eE

@visionmedia The answer is really context specific.

It sounds like you're trying to route messages on the publishing side to specific consumers (perhaps external clients?). We didn't explicitly design NSQ for this use case. In fact we did quite the opposite - we try really hard to decouple publishers from consumers.

This doesn't mean that you can't architect a system where things work harmoniously, though.

First, to answer your question, nsqd has no built in limits on # of topics/channels/clients. However, at some point you will reach the resource limits of your environment in terms of memory/cpu overhead of a topic/channel/client. I'd have to (and probably should) dig around to provide the exact # of bytes of memory overhead for each of those structs (and goroutines at runtime). We did some work in #236 to reduce CPU overhead when you hit 1000's of clients per nsqd - this ultimately had to do with the having many high-resolution timers by default. You can now configure your clients to disable output buffering to completely eliminate this overhead (and prioritize low latency).

But, one topic per client isn't the only way that this can be structured. As I had suggested in the link above, you can introduce multiplexing components that would be better suited to routing/targeting specific clients.

I whipped up a quick visual of what this topology might look like:

I think there are a lot of benefits to structuring the problem like this...

How does this match up with your requirements? Can you provide a little more information as to what you're building?

Hope this helps.

tj · 2014-02-10T22:59:44Z

thanks for the reply! yeah we can totally service the jobs that way, that's similar to what we're doing now through rabbitmq - hashing to 100 different topics and pull from those in an attempt to handle busy producers fairer. We could always bump that up to 1000 or so to help mitigate but it would be killer if the supported it.

mreiferson · 2014-02-10T23:14:52Z

The primary and most important difference between this proposal and your rabbitmq setup would be that this would not require any hashing.

The nsqd are "partitioned" naturally based on what published messages they receive. In this diagram they are co-located next to the producer (the # producers you deploy is equal to the number of nsqd responsible for that topic).

Also notice there's only one topic.

Each mux application connects on a separate channel for said topic, so each mux application gets the full stream (and can therefore service any client). It would discard any message routed to a client that it doesn't know about.

There is certainly a bit of redundant work being performed by mux, but the benefits are clear:

you no longer have to hash manually to distribute load
a client can connect to any mux (LB in front of mux?)
run as many mux as you want redundancy in that tier
you do not have to manage 1000s of topics

tj · 2014-02-11T01:44:21Z

re-read your suggestion, sounds good to me, but wouldn't there still be a fairness issue? For example if producer A sent 15,000 messages at once, and then B / C sent 100 each, we'd be stuck processing A for a while if it's all in one topic (unless our maxInFlight chews through them fast enough)

jehiah · 2014-02-11T02:23:05Z

jumping in on this thread.

NSQ message flow control is driven by the clients; This means that it's up to the client to distribute it's RDY amongst it's connections. In your example, if you had maxInFlight > 3 then you would have at least a concurrent RDY 1 from each client to nsqd on each of A, B and C and you'd work off the backlog evenly until finishing the 100 from B and C.

If however you had a single consumer and maxInFlight of 1, you'd get be connected to nsqd on A, B, and C but would request a message from one message source at a time [1]. In go-nsq, there is a timer that redistributes the RDY request to another server if it hasn't gotten a message in 5 seconds. It's expected that clients are generally distributed as well so if you had a larger number of clients (say 10 or 20) in total consuming the same topic, you'd expect that in aggregate they would pull evenly from A, B, and C even with maxInFlight of 1 such that they were individually consuming from a single source. (generally at bitly, we set maxInFlight to the range of expected source nsqd's unless it's a case where we expect individual message processing to be excessively long and we want to carefully avoid queued work for the consumer, or higher in cases where we have expected higher latency connections, or longer async client message processing)

[1]: since this is client dependent; i'm speaking mostly of go-nsq behavior. semantics might be slightly different amongst other client languages.

tj · 2014-02-11T04:42:06Z

I should clarify some more, all the terms come off sounding like a client/producer node haha. What I really meant was a client as in one of our customers. They send us data (20,000+ of them), and we need a mechanism to mitigate the issue of one customer sending a ton of data at once, meanwhile the rest are stuck waiting in line while we process those.

I can see the mux technique working, but with our number of customers growing every day I imagine all those copies might become a decent burden on the system?

I was thinking maybe as an alternative (since we know before-hand) we could group all the small customers that don't produce much into one topic, and then the larger enterprise customers could each have their own, but it's hard to say how large that will get as well. Currently when one gets saturated in rabbit, all of the others that are hashed into that queue are left waiting, which is definitely not ideal, delivery time in our case definitely matters

mreiferson · 2014-02-12T02:48:12Z

@visionmedia according to your description, it doesn't sound like you even need the mux approach that I had suggested. It's only relevant if you need messages to be routed to specific consumers and it seems like I misunderstood your initial question (you're partitioning for fairness).

I suspect some of your current issues aren't as relevant in the suggested NSQ setup (everything but the mux part) assuming you have "round-robin" style load balancing in front of your applications that receive these data payloads. A strict round-robin approach wouldn't treat any client differently (high or low volume) which would mean all nsqd co-located with those applications would be expected to get a reasonably even incoming volume. Effectively, this means you don't have the problem of one client being permanently sharded to a specific queue (in rabbit parlance) - said client would have no way of head-of-line blocking.

Now, this is all predicated on the fact that your consumer layer has sufficient resources to handle varying aggregate volume. At least, in this situation, either everything backs up or nothing, which I believe is desirable.

Specifically, I'm suggesting:

a single topic (per data stream ala typical NSQ setup)
nsqd co-located with applications receiving data payloads (this replaces your "sharding")
round-robin or similar blind fairness load balancing in front of (2)
no intermediate consumer (no mux)
a sufficiently provisioned consumer layer

tj · 2014-02-12T15:28:45Z

cool thanks for the feedback! I'll so some more testing and see how things behave

mreiferson added the question label Feb 10, 2014

tj closed this as completed Feb 12, 2014

mreiferson mentioned this issue Feb 18, 2014

nsqd: ephemeral topics #305

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

topic capacity #307

topic capacity #307

tj commented Feb 10, 2014

mediafinance commented Feb 10, 2014

mreiferson commented Feb 10, 2014

tj commented Feb 10, 2014

mreiferson commented Feb 10, 2014

tj commented Feb 11, 2014

jehiah commented Feb 11, 2014

tj commented Feb 11, 2014

mreiferson commented Feb 12, 2014

tj commented Feb 12, 2014

topic capacity #307

topic capacity #307

Comments

tj commented Feb 10, 2014

mediafinance commented Feb 10, 2014

mreiferson commented Feb 10, 2014

tj commented Feb 10, 2014

mreiferson commented Feb 10, 2014

tj commented Feb 11, 2014

jehiah commented Feb 11, 2014

tj commented Feb 11, 2014

mreiferson commented Feb 12, 2014

tj commented Feb 12, 2014