Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in MeterRegistry#remove with many meters #5466

Closed
joshb1050 opened this issue Sep 5, 2024 · 8 comments
Closed

Performance regression in MeterRegistry#remove with many meters #5466

joshb1050 opened this issue Sep 5, 2024 · 8 comments
Labels
bug A general bug module: micrometer-core An issue that is related to our core module performance Issues related to general performance
Milestone

Comments

@joshb1050
Copy link

joshb1050 commented Sep 5, 2024

Describe the bug
A clear and concise description of what the bug is.

We noticed a performance regression related to #4857. Specifically, with a lot of meters, this preFilterIdToMeterMap map could be hundreds of thousands of elements or more, and we linearly iterate through them.

Environment

  • Micrometer version [e.g. 1.7.1]
  • Micrometer registry [e.g. prometheus]
  • OS: [e.g. macOS]
  • Java version: [e.g. output of java -version]

Micrometer 1.13.2, able to produce on macOS Sonoma 14.2.

This was discovered after it was bumped in ActiveMQ Artemis.

openjdk 17.0.11 2024-04-16 LTS
OpenJDK Runtime Environment Zulu17.50+19-CA (build 17.0.11+9-LTS)
OpenJDK 64-Bit Server VM Zulu17.50+19-CA (build 17.0.11+9-LTS, mixed mode, sharing)

To Reproduce
How to reproduce the bug:

Unfortunately don't have a MRE to share, however this can likely be seen if creating many meters and gauges and then trying to delete them.

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here, e.g. related issues.

@shakuzen shakuzen added the performance Issues related to general performance label Sep 6, 2024
@shakuzen shakuzen added this to the 1.13.x milestone Sep 6, 2024
@shakuzen shakuzen added bug A general bug module: micrometer-core An issue that is related to our core module and removed waiting-for-triage labels Sep 6, 2024
@shakuzen
Copy link
Member

shakuzen commented Sep 6, 2024

Thank you for the report and sorry about the performance regression. I'm not sure what will be the best solution for this. I think we were somewhat aware this would not be very performant for remove calls but took the assumption those should be far fewer than register calls, and remove calls should generally not be on the hot path for applications. I suspect that's still true, but in cases where remove is called more often, wasting a lot of CPU cycles and locking the meter map still hurts overall performance.
/cc @lenin-jaganathan

@shakuzen shakuzen changed the title Performance regression related to #4857 Performance regression in MeterRegistry#remove with many meters Sep 6, 2024
@jonatan-ivanov
Copy link
Member

Could you please share your use case? Especially why you have hundreds of thousands meters that you remove?

@jbertram
Copy link

jbertram commented Sep 6, 2024

ActiveMQ Artemis is a message broker and we register & remove meters as part of a queue's lifecycle (among other things). Currently each & every queue has 17 corresponding meters. In certain use-cases (e.g. heavy publish/subscribe) queues can come and go a lot and there can be a lot of them. In order to maintain consistency for internal data structures we have some concurrency controls for these processes and registering & removing meters is part of that. This means that slow meter removal is impacting performance (e.g. blocking other threads from removing or adding new queues and meters). We can certainly investigate moving meter removal outside of these concurrency controls, but it would be awesome if performance was improved within Micrometer so that wasn't necessary.

@jbertram
Copy link

jbertram commented Sep 9, 2024

I just wanted to confirm no further explanation is needed in terms of the use-case.

@shakuzen
Copy link
Member

The use case makes sense. Thank you for the explanation. We have a similar thing happening with Kafka metrics as well. We'll have to see what we can do to improve this.

@ikhoon
Copy link
Contributor

ikhoon commented Dec 17, 2024

I don't know much about Mircometer's internal implementation, but this problem doesn't seem difficult to solve. I think it could be solved by adding just a reverse map, but should such a workaround be avoided?

@shakuzen
Copy link
Member

I think there were reasons why such an approach would not have worked before (e.g. cases where multiple pre-filter IDs map to the same post-filter ID), but we changed things so we only store one of those pre-filter IDs in the cache. Given that, it looks like it should work. I've opened #5750.

@shakuzen shakuzen modified the milestones: 1.13.x, 1.13.10 Dec 18, 2024
@shakuzen
Copy link
Member

For anyone affected by this, please try out 1.13.10-SNAPSHOT or 1.14.3-SNAPSHOT and let us know if the performance regression is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A general bug module: micrometer-core An issue that is related to our core module performance Issues related to general performance
Projects
None yet
Development

No branches or pull requests

5 participants