Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MeterRegistry.remove() blocks a thread for a long time #5743

Closed
ikhoon opened this issue Dec 13, 2024 · 4 comments
Closed

MeterRegistry.remove() blocks a thread for a long time #5743

ikhoon opened this issue Dec 13, 2024 · 4 comments
Labels
duplicate A duplicate of another issue

Comments

@ikhoon
Copy link
Contributor

ikhoon commented Dec 13, 2024

Describe the bug

We have a giant monolithic server that handles LINE messages. The server produces tons of metrics so a small delay could be critical.

MeterRegistry.remove() was not a problem when a small number of connections were closed but when 100+ connections were closed simultaneously. Many event loops were blocked to acquire a lock to call MeterRegistry.remove(), resulting in an outage.
https://github.com/line/armeria/blob/0b204554fedde24c79e001a1bc153bee24a011cc/core/src/main/java/com/linecorp/armeria/client/ConnectionPoolMetrics.java#L79-L91

We released a hotfix to not call MeterRegistry.remove() directly. A scheduled job was added to remove inactive Meter periodically instead.The issue has been resolved, but I would like to improve the performance of MeterRegistry.remove() to avoid similar issues in the future.

Environment

  • Micrometer version 1.13.6
  • OS: Linux
  • Java version: 11

Expected behavior

Unfortunately, there is no profile of what part inside MeterRegistry.remove() was taking so long. When I checked the code, the iteration of preFilterIdToMeterMap may take long.

Iterator<Map.Entry<Id, Meter>> iterator = preFilterIdToMeterMap.entrySet().iterator();
while (iterator.hasNext()) {
Map.Entry<Id, Meter> nextEntry = iterator.next();
if (nextEntry.getValue().equals(removedMeter)) {
stalePreFilterIds.remove(nextEntry.getKey());
iterator.remove();
}
}

So I suggest adding a reverse map to avoid iterating preFilterIdToMeterMap.

Additional context

@ikhoon
Copy link
Contributor Author

ikhoon commented Dec 13, 2024

For more context, our server has 30k+ Meters. It takes 15 ms to remove a Meter from MeterRegistry.

@ValueSource(ints = { 100, 1_000, 10_000, 30_000, 100_000, 1_000_000 })
@ParameterizedTest
void testMeterRemove(int size) {
    final SimpleMeterRegistry meter = new SimpleMeterRegistry();
    final List<Counter> counters = new ArrayList<>();
    for (int i = 0; i < size; i++) {
        final Counter counter = meter.counter("test" + i, "tag1", "value" + i, "tag2", "value" + i);
        counters.add(counter);
        counter.increment();
    }

    final Stopwatch stopwatch = Stopwatch.createStarted();
    meter.remove(counters.get(size - 1));
    final long elapsed = stopwatch.elapsed(TimeUnit.MILLISECONDS);
    logger.info("Removing a counter from a registry with {} counters took {} ms", size, elapsed);
}

@shakuzen
Copy link
Member

@ikhoon thank you for all the details and reporting this. Sorry for the issues this caused. I believe this is a duplicate of #5466. We still need to figure out how best to fix that, but it sounds like most people badly affected by this are wanting the feature of automatically removing old/inactive meters. I wonder if we had that feature if we could do it in a more performant way than removing a generic meter.

@shakuzen
Copy link
Member

Duplicate of #5466

@shakuzen shakuzen marked this as a duplicate of #5466 Dec 16, 2024
@shakuzen shakuzen closed this as not planned Won't fix, can't repro, duplicate, stale Dec 16, 2024
@shakuzen shakuzen added duplicate A duplicate of another issue and removed waiting-for-triage labels Dec 16, 2024
@ikhoon
Copy link
Contributor Author

ikhoon commented Dec 17, 2024

It would be useful if Micrometer offered an API that removes inactive meters in certain conditions, similar to other cache implementations.

That being said, many people don't realize that the MeterRegistry.remove() operation is not scalable. I think the most important thing is to improve the MeterRegistry.remove() operation so that it has performance close to HashMap's remove, O(1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate A duplicate of another issue
Projects
None yet
Development

No branches or pull requests

2 participants