Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit count of HTTP channels with tracked stats #77303

Merged

Conversation

DaveCTurner
Copy link
Contributor

Today we expire the client stats for HTTP channels 5 minutes after they
close. It's possible to open a very large number of HTTP channels in 5
minutes, possibly inadvertently, and the stats for those channels can be
overwhelming.

This commit introduces a limit on the number of channels tracked by each
node which applies in addition to the age limit, and makes these limits
configurable via static settings. It drops the pruning of old stats when
starting to track a new channel and instead uses a queue to expire the
oldest stats when each channel closes if necessary to respect the count
limit; it only performs age-based expiry when retrieving the stats,
since the count limit now bounds the memory needed. Finally, it
tightents up some missing synchronization and makes sure that we expose
only immutable objects to the stats subsystem.

Today we expire the client stats for HTTP channels 5 minutes after they
close. It's possible to open a very large number of HTTP channels in 5
minutes, possibly inadvertently, and the stats for those channels can be
overwhelming.

This commit introduces a limit on the number of channels tracked by each
node which applies in addition to the age limit, and makes these limits
configurable via static settings. It drops the pruning of old stats when
starting to track a new channel and instead uses a queue to expire the
oldest stats when each channel closes if necessary to respect the count
limit; it only performs age-based expiry when retrieving the stats,
since the count limit now bounds the memory needed. Finally, it
tightents up some missing synchronization and makes sure that we expose
only immutable objects to the stats subsystem.
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Sep 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@DaveCTurner
Copy link
Contributor Author

Failure is #77297; @elasticmachine please run elasticsearch-ci/part-2

Copy link
Contributor

@danhermann danhermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner, thanks for improving the collection of these stats. It seems advantageous to provide limits for the number of closed connections that are tracked even if the default of 10k is unlikely to be reached in normal workloads. I don't love the idea of silently truncating the stats for closed connections if the limit is reached, though I suppose a user can deduce that occurred with high probability if (# closed connection results returned) == (SETTING_HTTP_CLIENT_STATS_MAX_CLOSED_CHANNEL_COUNT).

@DaveCTurner DaveCTurner merged commit 1045abe into elastic:master Sep 8, 2021
@DaveCTurner DaveCTurner deleted the 2021-09-06-http-client-stats branch September 8, 2021 06:26
@DaveCTurner
Copy link
Contributor Author

Thanks Dan, yes I agree that it's not great to lose fidelity here but it's a tricky balance between that and runaway resource usage. Note that we still accurately count the number of HTTP connections that have ever been opened, so you can also tell if you're missing something by comparing the specific clients you see over time against that counter.

DaveCTurner added a commit that referenced this pull request Sep 8, 2021
Today we expire the client stats for HTTP channels 5 minutes after they
close. It's possible to open a very large number of HTTP channels in 5
minutes, possibly inadvertently, and the stats for those channels can be
overwhelming.

This commit introduces a limit on the number of channels tracked by each
node which applies in addition to the age limit, and makes these limits
configurable via static settings. It drops the pruning of old stats when
starting to track a new channel and instead uses a queue to expire the
oldest stats when each channel closes if necessary to respect the count
limit; it only performs age-based expiry when retrieving the stats,
since the count limit now bounds the memory needed. Finally, it
tightents up some missing synchronization and makes sure that we expose
only immutable objects to the stats subsystem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >enhancement Team:Data Management Meta label for data/management team v7.16.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants