Increased build-time RSS usage in `micrometer-prometheus` integration test after moving Kubernetes to @ConfigMapping #44216

zakkak · 2024-10-31T09:50:48Z

Description

Since #43862 the build-time RSS usage of native-image builds have increased in the micrometer-prometheus integration test (note that I haven't observed similar changes in other tests).

The change didn't have any negative effects to the native executable size and the reachable types/methods, on the contrary it decreased the total image size by ~400KB.

The purpose of this issue is to bring awareness of this increase and keep track of it.

Note

quarkusio/quarkus-images#282 was also merged on the 24th of October but I believe the issue is related to a Quarkus change since it popped up in the upstream oracle/graal nightly builds that are not using the quarkus builder images.

Note from @gsmet: here is another report posted by @mocenas here: that comes with a reproducer: #44436

Compiling quarkus app into native executable using docker image, requires a lot more memory since quarkus is moving to @ConfigMapping.

We have an example where our testing app used to work with xmx=5g for native compilation a month ago. But since #43862 and #44140 requires at least 9g to successfully compile (otherwise compilation fails on OutOfMemory).

Increasing memory requirement to almost double, just by moving to @ConfigMapping is alarming.

At this moment it is somehow manageable, but with more planned in #44079 it is hard to say, how much memory will it require at the end.

The text was updated successfully, but these errors were encountered:

quarkus-bot · 2024-10-31T09:50:52Z

/cc @brunobat (micrometer), @ebullient (micrometer), @geoand (kubernetes), @iocanel (kubernetes), @radcortez (config)

michalvavrik · 2024-10-31T12:37:04Z

#44220 was closed as duplicate, but our application cannot be build into native executable at all, while what this issue says is: The purpose of this issue is to bring awareness of this increase and keep track of it.

Are you sure this is acceptable solution? If on my workstation that is very strong I cannot build application anymore, is it appropriate action?

gsmet · 2024-10-31T12:41:15Z

From what I can see you have a Xmx build limit:

4.44GB of memory (14.3% of 31.10GB system memory, set via '-Xmx5g')

Try to raise that and see how it goes. Not saying it's not an issue that needs fixing or at least diagnosing.

@zakkak do you have any idea how we could understand what's going on?

michalvavrik · 2024-10-31T12:46:35Z

From what I can see you have a Xmx build limit:

4.44GB of memory (14.3% of 31.10GB system memory, set via '-Xmx5g')

Try to raise that and see how it goes.

Setting it to 7g did the trick. I'll try experiment with it if I can set even less. Thank you

michalvavrik · 2024-10-31T12:57:08Z

Setting quarkus.native.native-image-xmx to 6g failes over OOM.

zakkak · 2024-10-31T12:59:08Z

@zakkak do you have any idea how we could understand what's going on?

One would need to attach a Java profiler to native-image (it's a Java application) and see what's causing the additional allocations. Note that this might not be trivial through as the increased number of allocations could be the result of some new call-paths being explored by the analysis.

radcortez · 2024-11-04T18:01:37Z

I wonder if we were close to the limit, and the change tipped it off. Mapping do add an additional class per root and group.

michalvavrik · 2024-11-04T18:12:10Z

I could misunderstood this, but in case of the QE TS many-extensions module, it used to work with 5g and now 6g is not enough, so close is fuzzy term, you still may be right. BTW we had several other modules affected, I just had to raise 5g limit for quite simple Funqy serverless test in native from 4g.

radcortez · 2024-11-04T21:10:08Z

Are all the modules using the kubernetes extension?

michalvavrik · 2024-11-04T22:58:35Z

Yes, all of them use kubernetes or openshift extensions.

rsvoboda · 2024-11-12T08:25:13Z

Experiment on my MBP M3 with xmx set to 8g. Quarkus 3.16, 3.15, 3.14 can handle "many extensions" module from https://github.com/quarkus-qe/beefy-scenarios with peak RSS ~ 5GB. Quarkus main (7a6b0eea9b8) utilizes full available memory and spends massive time on GC activity

sdk use java 23.1.5.r21-mandrel
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=999-SNAPSHOT -Dquarkus-plugin.version=999-SNAPSHOT | grep -e "Peak RSS" -e "Finished generating"
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=3.16.2 -Dquarkus-plugin.version=3.16.2 | grep -e "Peak RSS" -e "Finished generating"
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=3.15.1 -Dquarkus-plugin.version=3.15.1 | grep -e "Peak RSS" -e "Finished generating"
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=3.14.4 -Dquarkus-plugin.version=3.14.4 | grep -e "Peak RSS" -e "Finished generating"


Using java version 23.1.5.r21-mandrel in this shell.
                       48.2s (45.7% of total time) in 279 GCs | Peak RSS: 7.47GB | CPU load: 8.25
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 1m 45s.
                        5.2s (8.9% of total time) in 499 GCs | Peak RSS: 5.30GB | CPU load: 7.84
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 58.7s.
                       6.1s (10.1% of total time) in 936 GCs | Peak RSS: 4.93GB | CPU load: 7.81
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 1m 0s.
                        5.7s (9.8% of total time) in 742 GCs | Peak RSS: 4.88GB | CPU load: 7.89
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 58.2s.

@maxandersen @cescoffier, could you please assign somebody to investigate more in detail what causes such memory demand when building a native image?

Set of defined extensions: https://github.com/quarkus-qe/beefy-scenarios/blob/main/003-quarkus-many-extensions/pom.xml

Side note: I also experimented with different versions of Mandrel (23.1.4.r21-mandrel, 23.1.5.r21-mandrel, 24.0.2.r22-mandrel, 24.1.1.r23-mandrel). I noticed a difference between 23.1.4.r21-mandrel and 23.1.5.r21-mandrel when the .4 version needed more memory and more GC cycles.

rsvoboda · 2024-11-12T08:31:39Z

The thing to investigate deeper is kubernetes extension as hinted in comments above.

Peak RSS goes to ~5 GB level with 999-SNAPSHOT when kubernetes extension is removed from 003-quarkus-many-extensions/pom.xml

sdk use java 23.1.5.r21-mandrel
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=999-SNAPSHOT -Dquarkus-plugin.version=999-SNAPSHOT | grep -e "Peak RSS" -e "Finished generating"
mvn clean package -DskipTests -Dnative -Dquarkus.native.native-image-xmx=8g -f '003-quarkus-many-extensions' -Dquarkus.platform.version=3.16.2 -Dquarkus-plugin.version=3.16.2 | grep -e "Peak RSS" -e "Finished generating"

Using java version 23.1.5.r21-mandrel in this shell.
                       7.8s (10.9% of total time) in 1372 GCs | Peak RSS: 4.95GB | CPU load: 7.88
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 1m 11s.
                       7.0s (10.3% of total time) in 1264 GCs | Peak RSS: 4.48GB | CPU load: 7.80
Finished generating '003-quarkus-many-extensions-1.0.0-SNAPSHOT-runner' in 1m 7s.

gsmet · 2024-11-12T10:05:09Z

I think we will need some help from the Mandrel team here @zakkak @galderz @Karm as something in the new code seems to trigger an unwanted behavior of Mandrel/GraalVM but I think we will need some Mandrel/GraalVM knowledge to try to figure out what's going on.

radcortez · 2024-11-12T11:09:55Z

Something to consider is that we never noticed such behavior with other extensions using mappings, and we have already used them for some time. So, it may be something only related to the Kubernetes configuration.

I'm going to try to investigate, but I agree with @gsmet that it would be helpful if someone from Mandrel also helped.

zakkak · 2024-11-12T12:17:21Z

@radcortez I am happy to work with you on this. I have it in my TODO list anyway but need to find some free time for it. Let's sync on Zulip. Note: anyone else willing to jump in is very much welcome on my side :)

radcortez · 2024-11-12T12:36:49Z

Sounds good :)

franz1981 · 2024-11-12T15:49:32Z

@rsvoboda
https://github.com/quarkus-qe/beefy-scenarios
what it is and used for? I wasn't aware of this

rsvoboda · 2024-11-12T16:25:33Z

https://github.com/quarkus-qe/beefy-scenarios is test coverage that we built during RHBQ 1.x times (together with https://github.com/quarkus-qe/quarkus-openshift-test-suite), with RHBQ 2+ we moved to https://github.com/quarkus-qe/quarkus-test-suite (with https://github.com/quarkus-qe/quarkus-test-framework/) which helped us to simplify test development for both baremetal and OpenShift target environments.

https://github.com/quarkus-qe/beefy-scenarios is kept around for continuous community testing with Quarkus main, it's also part of ecosystem CI / https://status.quarkus.io.

radcortez · 2024-11-14T11:46:11Z

Ok, I think I've got something:

First, one thing to consider is that changing the config classes is not an issue directly. In fact, in this particular case, the config classes are available only at build time and are not even available during the native image build. So, what is the issue?

In our generated startup Config, we match the configuration to determine if it is build time or runtime to warn about possible issues. As you know, build-time configuration may be present in application.properties, but we have to ignore it. If we don't have the config classes because they are in the deployment module, we need to do it another way.

The old config implementation would generate a method similar to this, which either ignores keys or populates the config object:

    private static void siParseKey_quarkus_test_container/* $FF was: siParseKey:quarkus:test:container*/(SmallRyeConfig var0, NameIterator var1) {
        if (!var1.hasNext()) {
            String var2 = var1.getName();
            ((HashSet)unknown).add(var2);
        } else if (!var1.nextSegmentEquals("additional-exposed-ports")) {
            if (!var1.nextSegmentEquals("labels")) {
                if (!var1.nextSegmentEquals("network")) {
                    if (!var1.nextSegmentEquals("volume-mounts")) {
                        String var3 = var1.getName();
                        ((HashSet)unknown).add(var3);
                    } else {
                        var1.next();
                        siParseKey:quarkus:test:container:volume-mounts(var0, var1);
                    }
                } else {
                    var1.next();
                    emptyParseKey(var0, var1);
                }
            } else {
                var1.next();
                siParseKey:quarkus:test:container:labels(var0, var1);
            }
        } else {
            var1.next();
            siParseKey:quarkus:test:container:additional-exposed-ports(var0, var1);
        }
    }

The new implementation adds the names to a KeyMap to do the matching instead because it is SmallRyeConfig that populates the object; we only need to know if a build time property is matched or not and ignore it:

        KeyMap var0 = new KeyMap();
        var0.findOrAdd("quarkus.knative.rbac.roles.*.policy-rules.*.resource-names").putRootValue(true);
        var0.findOrAdd("quarkus.knative.init-containers.*.env-vars.*.value").putRootValue(true);
        var0.findOrAdd("quarkus.rest-client.proxy-address").putRootValue(true);
        var0.findOrAdd("quarkus.kubernetes.init-containers.*.mounts.*.name").putRootValue(true);
        var0.findOrAdd("quarkus.kubernetes.app-secret").putRootValue(true);
        var0.findOrAdd("quarkus.kubernetes.prometheus.port").putRootValue(true);
        ...

So, when config is moved from the old implementation to the new one, the population method is deleted, and new entries are added to the KeyMap.

I was already aware that KeyMap had some runtime performance issues. Still, those were not very noticeable because the number of entries was low. With the continuous move to the new implementation, the entries started to grow. Apparently, this also greatly impacts the memory of the image build (which I was not aware of, but due to the large number of Kubernetes config entries, it became very noticeable).

I already had a more lightweight replacement implemented in https://github.com/smallrye/smallrye-config/blob/3edebd4b38f85aaaf3c1872845a2add89b6374ad/implementation/src/main/java/io/smallrye/config/PropertyName.java, and replaced the KeyMap match in Quarkus with #44448.

I was able to observe the memory increase by building the native image with the #44216, and I do see the memory decrease with #44448:

Current:

17.1s (13.0% of total time) in 178 GCs | Peak RSS: 8.05GB | CPU load: 10.12

With #44448:

9.3s (8.6% of total time) in 156 GCs | Peak RSS: 6.62GB | CPU load: 10.20

@zakkak @michalvavrik, can you try that and see if you notice any improvement? Thanks!

geoand · 2024-11-14T11:59:39Z

Very nice @radcortez!

zakkak · 2024-11-14T12:17:24Z

Great work @radcortez!

I confirm that the memory usage has decreased significantly

Before

After (with #44448)

radcortez · 2024-11-14T12:24:24Z

Excellent.

I was now trying to compare to before (#44216) to see if we are the same or if there is still some degradation. I do get some minor differences, but maybe that is only noise.

BTW, for additional reference, the implementation of #44448, is something like:

   private static Set generateMappedProperties() {
      HashSet var0 = new HashSet();
      PropertyName var1 = new PropertyName("quarkus.config-tracking.file");
      var0.add(var1);
      PropertyName var2 = new PropertyName("quarkus.tls.*.trust-store.pem.certs");
      var0.add(var2);
      PropertyName var3 = new PropertyName("quarkus.log.socket.filter");
      var0.add(var3);
      PropertyName var4 = new PropertyName("quarkus.vertx.resolver.rotate-servers");
      var0.add(var4);
      PropertyName var5 = new PropertyName("quarkus.vertx.cluster.public-port");
      var0.add(var5);
      PropertyName var6 = new PropertyName("quarkus.log.file.async");
      var0.add(var6);
      PropertyName var7 = new PropertyName("quarkus.log.handler.syslog.*.syslog-type");
      ...

The check just calls a contains on the Set to match the name.

gsmet · 2024-11-14T12:31:34Z

And the Set is stored in the image heap for native image or you generate it even for native image?

radcortez · 2024-11-14T12:37:04Z

It is stored, because we need it for both phases.

I'm still unsure if this would be the best approach, (I'm still playing with it), because after we start and validate we don't need this anymore.

rsvoboda · 2024-11-14T14:42:41Z

Just for the record, Peak RSS is in the same level as 3.16.2 with #44448 PR for the case mentioned in #44216 (comment) @michalvavrik no need to do additional verification

zakkak · 2024-11-18T10:59:16Z

FTR the metrics are back to the original values more or less:

michalvavrik · 2024-11-18T11:09:47Z

That's fantastic. Thanks everyone

zakkak added the kind/enhancement New feature or request label Oct 31, 2024

quarkus-bot bot added area/config area/kubernetes area/metrics labels Oct 31, 2024

zakkak mentioned this issue Oct 31, 2024

Bigger Quarkus application cannot be build as native executable now with Mandrel update to 23.1.5.0 #44220

Closed

zakkak added the area/native-image label Oct 31, 2024

michalvavrik mentioned this issue Oct 31, 2024

Raise Many extensions module native image Xmx to avoid out of memory build failures and decrease Xmx used by GH actions to 4g from 5g quarkus-qe/quarkus-test-suite#2153

Merged

9 tasks

rsvoboda mentioned this issue Nov 1, 2024

Increase the default memory limit for native-image to 5g quarkus-qe/quarkus-test-suite#2155

Merged

9 tasks

gsmet mentioned this issue Nov 12, 2024

Moving to @ConfigMapping vastly increases memory requirements for native build #44436

Closed

mocenas mentioned this issue Nov 12, 2024

Increase memory limit for many-extensions test quarkus-qe/quarkus-test-suite#2182

Merged

9 tasks

zakkak linked a pull request Nov 14, 2024 that will close this issue

Reduce allocation in matching of unknown properties #44448

Merged

radcortez mentioned this issue Nov 14, 2024

Reduce allocation in matching of unknown properties #44448

Merged

gsmet closed this as completed in #44448 Nov 15, 2024

quarkus-bot bot added this to the 3.18 - main milestone Nov 15, 2024

This was referenced Nov 18, 2024

Apply idea code analysis suggestions Service*-End quarkus-qe/quarkus-test-suite#2194

Merged

Lower super size xmx to 5g quarkus-qe/quarkus-test-suite#2200

Merged

gsmet modified the milestones: 3.18 - main, 3.17.0 Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increased build-time RSS usage in `micrometer-prometheus` integration test after moving Kubernetes to @ConfigMapping #44216

Increased build-time RSS usage in `micrometer-prometheus` integration test after moving Kubernetes to @ConfigMapping #44216

zakkak commented Oct 31, 2024 •

edited by gsmet

Loading

quarkus-bot bot commented Oct 31, 2024

michalvavrik commented Oct 31, 2024

gsmet commented Oct 31, 2024

michalvavrik commented Oct 31, 2024 •

edited

Loading

michalvavrik commented Oct 31, 2024

zakkak commented Oct 31, 2024

radcortez commented Nov 4, 2024

michalvavrik commented Nov 4, 2024 •

edited

Loading

radcortez commented Nov 4, 2024

michalvavrik commented Nov 4, 2024

rsvoboda commented Nov 12, 2024

rsvoboda commented Nov 12, 2024

gsmet commented Nov 12, 2024

radcortez commented Nov 12, 2024

zakkak commented Nov 12, 2024

radcortez commented Nov 12, 2024

franz1981 commented Nov 12, 2024 •

edited

Loading

rsvoboda commented Nov 12, 2024

radcortez commented Nov 14, 2024 •

edited

Loading

geoand commented Nov 14, 2024

zakkak commented Nov 14, 2024

radcortez commented Nov 14, 2024

gsmet commented Nov 14, 2024

radcortez commented Nov 14, 2024

rsvoboda commented Nov 14, 2024

zakkak commented Nov 18, 2024

michalvavrik commented Nov 18, 2024

Increased build-time RSS usage in micrometer-prometheus integration test after moving Kubernetes to @ConfigMapping #44216

Increased build-time RSS usage in micrometer-prometheus integration test after moving Kubernetes to @ConfigMapping #44216

Comments

zakkak commented Oct 31, 2024 • edited by gsmet Loading

Description

quarkus-bot bot commented Oct 31, 2024

michalvavrik commented Oct 31, 2024

gsmet commented Oct 31, 2024

michalvavrik commented Oct 31, 2024 • edited Loading

michalvavrik commented Oct 31, 2024

zakkak commented Oct 31, 2024

radcortez commented Nov 4, 2024

michalvavrik commented Nov 4, 2024 • edited Loading

radcortez commented Nov 4, 2024

michalvavrik commented Nov 4, 2024

rsvoboda commented Nov 12, 2024

rsvoboda commented Nov 12, 2024

gsmet commented Nov 12, 2024

radcortez commented Nov 12, 2024

zakkak commented Nov 12, 2024

radcortez commented Nov 12, 2024

franz1981 commented Nov 12, 2024 • edited Loading

rsvoboda commented Nov 12, 2024

radcortez commented Nov 14, 2024 • edited Loading

geoand commented Nov 14, 2024

zakkak commented Nov 14, 2024

Before

After (with #44448)

radcortez commented Nov 14, 2024

gsmet commented Nov 14, 2024

radcortez commented Nov 14, 2024

rsvoboda commented Nov 14, 2024

zakkak commented Nov 18, 2024

michalvavrik commented Nov 18, 2024

Increased build-time RSS usage in `micrometer-prometheus` integration test after moving Kubernetes to @ConfigMapping #44216

Increased build-time RSS usage in `micrometer-prometheus` integration test after moving Kubernetes to @ConfigMapping #44216

zakkak commented Oct 31, 2024 •

edited by gsmet

Loading

michalvavrik commented Oct 31, 2024 •

edited

Loading

michalvavrik commented Nov 4, 2024 •

edited

Loading

franz1981 commented Nov 12, 2024 •

edited

Loading

radcortez commented Nov 14, 2024 •

edited

Loading