-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MeterRegistry.remove()
blocks a thread for a long time
#5743
Comments
For more context, our server has 30k+ @ValueSource(ints = { 100, 1_000, 10_000, 30_000, 100_000, 1_000_000 })
@ParameterizedTest
void testMeterRemove(int size) {
final SimpleMeterRegistry meter = new SimpleMeterRegistry();
final List<Counter> counters = new ArrayList<>();
for (int i = 0; i < size; i++) {
final Counter counter = meter.counter("test" + i, "tag1", "value" + i, "tag2", "value" + i);
counters.add(counter);
counter.increment();
}
final Stopwatch stopwatch = Stopwatch.createStarted();
meter.remove(counters.get(size - 1));
final long elapsed = stopwatch.elapsed(TimeUnit.MILLISECONDS);
logger.info("Removing a counter from a registry with {} counters took {} ms", size, elapsed);
} |
@ikhoon thank you for all the details and reporting this. Sorry for the issues this caused. I believe this is a duplicate of #5466. We still need to figure out how best to fix that, but it sounds like most people badly affected by this are wanting the feature of automatically removing old/inactive meters. I wonder if we had that feature if we could do it in a more performant way than removing a generic meter. |
Duplicate of #5466 |
It would be useful if Micrometer offered an API that removes inactive meters in certain conditions, similar to other cache implementations. That being said, many people don't realize that the |
Describe the bug
We have a giant monolithic server that handles LINE messages. The server produces tons of metrics so a small delay could be critical.
MeterRegistry.remove()
was not a problem when a small number of connections were closed but when 100+ connections were closed simultaneously. Many event loops were blocked to acquire a lock to callMeterRegistry.remove()
, resulting in an outage.https://github.com/line/armeria/blob/0b204554fedde24c79e001a1bc153bee24a011cc/core/src/main/java/com/linecorp/armeria/client/ConnectionPoolMetrics.java#L79-L91
We released a hotfix to not call
MeterRegistry.remove()
directly. A scheduled job was added to remove inactiveMeter
periodically instead.The issue has been resolved, but I would like to improve the performance ofMeterRegistry.remove()
to avoid similar issues in the future.Environment
Expected behavior
Unfortunately, there is no profile of what part inside
MeterRegistry.remove()
was taking so long. When I checked the code, the iteration ofpreFilterIdToMeterMap
may take long.micrometer/micrometer-core/src/main/java/io/micrometer/core/instrument/MeterRegistry.java
Lines 777 to 784 in 3c53a7c
So I suggest adding a reverse map to avoid iterating
preFilterIdToMeterMap
.Additional context
The text was updated successfully, but these errors were encountered: