-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A slow cluster should not affect other clusters #172
Comments
Yes, this sounds like #154. Do you know if you're running a version including that fix? |
It seems not, my version is:
|
But my version seems to include some fix. The issue seems not fix it completely, carbon-c-relay reports |
I think you want to increase your queuesize in that case. It can be argued that the stalling should be configurable, which is ok, but be aware that stalling is the only way to inform an upstream writer that it should slow down. |
Your version is ANCIENT (v1.5). You don't have the fix included I mentioned. |
Yeah, now we have updated to align with the master and make influxdb faster, temporarily solves the problem. As to the stalling, we really do not want carbon-c-relay to stall the client (it is a statsd server), the stats is generated every |
so in that case the metric will just be dropped in the relay, instead of statsd |
Of course it would be possible to create some flag to disable stalling, for situations that demand that behaviour. Again, I don't know about your queuesize, but you may want to increase your queuesize. If influx cannot keep up with the inbound flow at all, then from a relay point of view something could perhaps be devised, but the application itself is useless of course. |
For some scenarios, stalling may be undesirable, or just in a different amount than the hardwired default. Therefore, allow to control the number of stalls before dropping metrics. The setting 0 is allowed, disabling stalls.
I've added an option for you to disable the stalling, use |
We have two cluster one is graphite, the other is influxdb 0.8. Most of our graph uses graphite. But, once in a while, we will see one or two points missing in our graph. After we see the stats of carbon-c-relay, we found that the influxdb cluster queue is full and the stalls are high, but the graphite part is working normally. If carbon-c-relay do not stall the client, the graphite part will work, carbon-c-relay should not play a role in traffic control by stalling client, it should just buffer the data to influxdb and if it is overflowing, just throw it away. That way, the good consumer (graphite) will work even though influxdb can not.
The text was updated successfully, but these errors were encountered: