-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: questions for large number of topics/channels #577
Conversation
https://github.com/bitly/nsq/blob/v0.3.2/nsqd/channel.go#L18 |
Can you supply a CPU profile dump from
Nothing that I can think of other than CPU/memory. Are you running |
Grabbed a profile with 1000 topics, each with a single channel, no clients: As @xiaost discovered, it's clearly the channel (and diskqueue) timers. An easy fix is to add configuration to extend this delay. A better fix is to come up with a better strategy for either:
Thoughts? |
Yes. GOMAXPROCS is 4
After I build a temporary nsqd with it set to 1000ms, the cpu usage dropped from 40-50% to 7%. IMHO, signaling mechanism seems a good idea. And I will let you know if I meet more problems. |
@allenlz are you going to take a pass at submitting a PR for this? |
@mreiferson looks good! 👍 why not use shuffle to pick up channels instead of |
@xiaost the difference is in my approach we generate a constant sized random |
@mreiferson what about: if num < len(channels) { // need shuffle
for i := 0; i < num; i++ {
j := rand.Intn(len(channels) - i)
channels[i], channels[i+j] = channels[i+j], channels[i]
}
} else {
num = len(channels)
} I suppose it is identical In probability |
I'm not sure I see the benefit. I am going to add a pool of goroutines (that grows according to the number of channels, with a configurable max) to actually do the work. This way, for large numbers of channels, it can be parallelized (like it currently is). |
} | ||
} | ||
|
||
if float64(numDirty)/float64(num) > n.opts.QueueGCDirtyPercent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean for a case where you have 1 hot channel (with a large deferred queue or something) and lots of quiet/idle channels. Should we loop on the same channels to drain them instead of picking different ones each loop?
That would mean that we will be actively draining priority queues (25% of them a time up to max QueueGCPoolMax) and only go back to the random selection if the one we were draining is empty. This would keep QueueGCInterval
from being an accidental cap of how fast we can drain priority queues of QueueGCPoolMax per QueueGCInterval in that edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each iteration should "drain" the deferred queue in the sense that there was no more work to do now. There might still be more in the queue that have not timed out. Does this clarify anything? Do you think we still need to somehow prioritize busier queues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh. the loop inside processInFlightQueue
is what i was missing. It works as i was suggesting then. (I mistakenly thought it only pop'd one message each time that channel got pushed into workCh
)
√
@jehiah updated names, bit of re-org, added comments and fixed up the exit paths PTAL |
LGTM. you ready for this to land? |
Yes, I'll 🔨 |
This moves to a single goroutine to process in-flight and deferred priority queues. It manages a pool of workers (configurable max) that process channels concurrently. It is a copy of redis's probabilistic algorithm that wakes up every 100ms to select a random 20 channels from a locally cached list (refreshed every 5s). If either of the queues had work to do the channel is considered "dirty". If 25% of the selected channels were dirty, the loop continues without sleep. For 1000 topics and channels with no clients connected - idle cpu usage dropped to ~8%.
0211726
to
cd1a2f1
Compare
🔥 👊 |
nsqd: questions for large number of topics/channels
Today, we met another problem about large number topics/channels. We found nslookupd consumes a lot of cpu. Please refer to #584 |
Background: We need to dispatch user messages to a specific topic, where every consumer only subscribe the message for a fixed range of user. Since nsqd doesn't support partitions by key, we created thousands of topics/channels, i.e. user_topic_[0-999], one topic for a range of users.
Let's suppose there will be 10K topics and channels, we want to know that whether nsqd will work well or not in this situations.
strace
tells there are lots of futex syscall.@xiaost helps to troubleshoot, it may be caused by the 100ms ticker for every channel, and maybe we need to make it configurable?
Thanks