-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit count of HTTP channels with tracked stats #77303
Limit count of HTTP channels with tracked stats #77303
Conversation
Today we expire the client stats for HTTP channels 5 minutes after they close. It's possible to open a very large number of HTTP channels in 5 minutes, possibly inadvertently, and the stats for those channels can be overwhelming. This commit introduces a limit on the number of channels tracked by each node which applies in addition to the age limit, and makes these limits configurable via static settings. It drops the pruning of old stats when starting to track a new channel and instead uses a queue to expire the oldest stats when each channel closes if necessary to respect the count limit; it only performs age-based expiry when retrieving the stats, since the count limit now bounds the memory needed. Finally, it tightents up some missing synchronization and makes sure that we expose only immutable objects to the stats subsystem.
Pinging @elastic/es-data-management (Team:Data Management) |
Failure is #77297; @elasticmachine please run elasticsearch-ci/part-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner, thanks for improving the collection of these stats. It seems advantageous to provide limits for the number of closed connections that are tracked even if the default of 10k is unlikely to be reached in normal workloads. I don't love the idea of silently truncating the stats for closed connections if the limit is reached, though I suppose a user can deduce that occurred with high probability if (# closed connection results returned) == (SETTING_HTTP_CLIENT_STATS_MAX_CLOSED_CHANNEL_COUNT).
Thanks Dan, yes I agree that it's not great to lose fidelity here but it's a tricky balance between that and runaway resource usage. Note that we still accurately count the number of HTTP connections that have ever been opened, so you can also tell if you're missing something by comparing the specific clients you see over time against that counter. |
Today we expire the client stats for HTTP channels 5 minutes after they close. It's possible to open a very large number of HTTP channels in 5 minutes, possibly inadvertently, and the stats for those channels can be overwhelming. This commit introduces a limit on the number of channels tracked by each node which applies in addition to the age limit, and makes these limits configurable via static settings. It drops the pruning of old stats when starting to track a new channel and instead uses a queue to expire the oldest stats when each channel closes if necessary to respect the count limit; it only performs age-based expiry when retrieving the stats, since the count limit now bounds the memory needed. Finally, it tightents up some missing synchronization and makes sure that we expose only immutable objects to the stats subsystem.
Today we expire the client stats for HTTP channels 5 minutes after they
close. It's possible to open a very large number of HTTP channels in 5
minutes, possibly inadvertently, and the stats for those channels can be
overwhelming.
This commit introduces a limit on the number of channels tracked by each
node which applies in addition to the age limit, and makes these limits
configurable via static settings. It drops the pruning of old stats when
starting to track a new channel and instead uses a queue to expire the
oldest stats when each channel closes if necessary to respect the count
limit; it only performs age-based expiry when retrieving the stats,
since the count limit now bounds the memory needed. Finally, it
tightents up some missing synchronization and makes sure that we expose
only immutable objects to the stats subsystem.