Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: message TTL #302

Open
paddycarver opened this issue Feb 6, 2014 · 14 comments
Open

nsqd: message TTL #302

paddycarver opened this issue Feb 6, 2014 · 14 comments

Comments

@paddycarver
Copy link
Contributor

One of the great things about NSQ is that it can be used as a message bus; e.g., event X happened, so everyone that is interested in event X gets notified. This is super cool.

thumbs up

One of the things that I'm really interested in about NSQ is that it helps me decouple my event producers and consumers. So the process getting information from NSQ doesn't even know the process putting the information there exists.

There's a problem with this, though (I think). From what I can tell, should I start publishing to a topic that has no consumers, that topic queue will build up forever until I run out of memory assigned to NSQ. Then it will start transparently persisting to disk (hooray!). Then, given enough time, the disk will fill up. At which point, I imagine, catastrophic failure.

boom

So here's my question/proposal: is there some way to set a "decay" on messages on a topic? After a certain amount of time, that data is no longer worth processing, so I'd like NSQ to just delete it.

Is there any support for this? If not, is there any plan to add support for something like this? Or am I missing a design decision somewhere?

@ploxiln
Copy link
Member

ploxiln commented Feb 6, 2014

There wasn't any plan for supporting this... but for some cases, what you want is to just completely empty a topic or channel, to throw out a huge backlog of obsolete messages. That is a feature that exists.

@mreiferson
Copy link
Member

mreiferson commented Feb 6, 2014

@paddyforan thanks for your question.

It's an idea that has been discussed before, but no decision made on whether it belongs in the core.

What you're describing is essentially a TTL on messages. It's somewhat related to #293 (and would require similar structural changes if implemented).

Taking a step back, we don't actually do a good job of documenting (or providing scripts for) a good approach to monitoring NSQ, but here are the highlights:

  1. Monitor the /ping endpoint of all cluster participants (nsqd, nsqadmin, nsqlookupd)

    This ensures that, at the lowest level, the process in question is alive and responding.

  2. Monitor the depth of every topic/channel in aggregate.

    This is the obvious check to ensure that messages are being processed.

    It works like this:

    1. Query nsqlookupd for all registered topics/channels
    2. Query /stats for all producers
    3. Sum
    4. Check against warning/critical thresholds
  3. Monitor that the nsqlookupd cluster has registrations for all nsqd (and their topics and channels).

    This ensures that nsqlookupd is able to provide accurate results for lookup queries from consumers and that all nsqd are properly phoning in and registering their metadata.

So, back to your question. As I see it, I'm not sure that NSQ should be able to solve the problem of a backlogged topic with no consumers (ever). It seems like the kind of edge case that falls into the administrators hands (monitor!).

Now, being able to set a TTL on message publication is still a very interesting idea, one where we should explore the implications and tradeoffs, but I would still argue that you would want to know about topics backing up regardless! For example, what if the volume of messages within your TTL was able to cause degradation?

@paddycarver
Copy link
Contributor Author

@ploxiln Yeah, I saw that, I was just curious if there was a more automated fashion for this.

@mreiferson thanks for the detailed response!

For point 2, assuming NSQ's graphite hook is being utilized, I shouldn't even have to query nsqlookupd and sum, right? I should be able to just see a queue growing out of control in Graphite.

As for your resolution: I had a feeling that would be the answer, and that's totally an acceptable answer. I just wanted to make sure there wasn't something built-in that was better suited.

Essentially, what I'm hearing is that events that you don't have a subscriber for today should not be pushed through NSQ. While that would be a nice thing to support (then new consumers can be created without needing to modify producers) I understand that pushing messages is not free, and this kind of "push everything you may want to consume later" mindset is not something that falls within NSQ's scope.

@ploxiln
Copy link
Member

ploxiln commented Feb 6, 2014

I might rephrase: nsq is designed for data you don't want to easily lose. It tends to think it should be an exceptional and intentional case when you "throw out" messages.

What we do in some cases is create one consumer, using the "nsq_to_file" tool, that just archives all messages to rolling hourly logs, and then sometime later we attach another reader that actually does stuff to a new channel. We then we still have those archives of messages that the new reader never saw, for batch processing or replaying (or just throwing out).

@mreiferson
Copy link
Member

You certainly could use the data in graphite for checks, but since the checks I described don't necessarily require historical context I would suggest you just query the instances directly to eliminate a middle-man. (I wouldn't want my monitoring of the NSQ cluster to depend on graphite).

(it seems @ploxiln beat me to it but since I wrote this next paragraph already)

There is another tool that's worth talking about, nsq_to_file. This might be a decent middle ground. If you have streams of data that you want to eventually consume, we always recommend that one of the channels for that topic be dedicated to archiving the stream. If, by convention, all of your topics have at least one channel (consumed by one or more nsq_to_file) then there is no backlog and you have a permanent archive.

@paddycarver
Copy link
Contributor Author

@ploxiln yeah, that's what I thought. But I need ops sign-off to get an NSQ cluster running internally, and they want to know what happens in weird edge cases (like someone tossing data on a queue, then nobody reading it), so I'm trying to come up with Good Answers to their questions. :)

Thanks for the reminder about nsq_to_file. I had forgotten that was a thing.

So to make sure I understand this right: If I have topic A with no channels, and it's sent messages 1, 2, 3, 4, and 5:

  • 1, 2, 3, 4, and 5 (plus any new messages) will persist indefinitely, until there's at least one channel.
  • When Channel B is created by a consumer, it starts by reading 1. At this point, the queue will empty normally, as quickly as Channel B can process it. I.E., Channel B will eventually receive 1, 2, 3, 4, and 5, and assuming the rate of messages is low enough that the consumers on Channel B can handle it, the queue won't build up anymore.
  • When Channel C is created, it will receive only the messages that Channel B has not yet processed, as the messages Channel B processes are deleted once there's no more Channels that want them. So if Channel B reads message 1 off Topic A before Channel C joins, Channel B will be the only channel to receive message 1, but Channel B and Channel C will both receive message 2.

Is my understanding correct?

Thanks for taking the time to explain the subtleties, by the way. Appreciate it.

@dmarkham
Copy link
Contributor

dmarkham commented Feb 7, 2014

I wish I was able to create a topic (topic#ephemeral) that acted like a ephemeral channel. It send to any active clients and with no active clients just dropped the messages on the floor.. Currently I have to run a channel to /dev/null so things don't backup then burst if a ephemeral channel is offline for a bit.

@ploxiln
Copy link
Member

ploxiln commented Feb 7, 2014

@paddyforan you're pretty much right - messages will wait in a topic until any channel exists, and then when one channel comes into existence and gets the messages, any channels that don't exist yet will never get them. But to be a bit more specific, the messages will "drain" from the topic into Channel B very quickly, regardless of how fast consumers on Channel B process them, so all 5 messages will almost instantly show up in Channel B and queue there, and they won't go to Channel C if it is created just a couple seconds later.

Since we usually don't want a channel to "miss" messages, we either:

  • carefully pre-create topics/channels before messages are put to that nsqd
  • have a cluster of nsqd and nsqlookupd instances such that nsqlookupd already knows what channels should be on that topic, and nsqd asks nsqlookupd at topic creation time what channels should exist, before messages start to flow

@dmarkham maybe you should just have your apps broadcast udp messages to the LAN or something 👅

@paddycarver
Copy link
Contributor Author

@ploxiln So appreciate. Such helpful. Very information. Wow.

thank you

@mreiferson
Copy link
Member

@dmarkham interesting. Can you open up a separate issue to discuss the use case and pros/cons/alternatives?

@cespare
Copy link
Contributor

cespare commented Feb 25, 2015

See also #549.

@renskiy
Copy link

renskiy commented Nov 13, 2019

Any ideas when this feature will be added? Like @dmarkham I also need message drop if ephemeral topic has no active channels. Message TTL = 0 would help with this I suppose. Or maybe if nsqd would have an option to drop such messages.

@ploxiln
Copy link
Member

ploxiln commented Nov 13, 2019

Since the discussion here, ephemeral topics were added (back in 2015). Ephemeral topics are automatically deleted when the last channel is deleted (and this works with ephemeral channels that are automatically deleted when the last consumer disconnects).

The only catch is: if a message is published to an ephemeral topic with no consumers, and no consumers come, the topic remains. The messages will queue on the ephemeral topic in this case, but ephemeral topics also have the disk-queue disabled, so when the mem-queue overflows, those messages are lost. You could set a small --mem-queue-size (500?).

@renskiy
Copy link

renskiy commented Nov 13, 2019

@ploxiln thanks for reply. I have the catch. And a huge amount of ephemeral topics with one single undelivered message. Even with small mem-queue-size /stats is very hard to read and parse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants