-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid type pollution in StringCollectionDeserializer #4848
Conversation
This patch fixes a type pollution issue when deserializing a `List<String>` property, by adding explicit type checks for common subclasses (ArrayList, HashSet). The type pollution issue is a Hotspot performance problem prior to OpenJDK 23, where repeated type checks of the same concrete type against multiple interface types can lead to significant performance problems in multi-core systems. The issue is described in detail [in this blog post](https://redhatperf.github.io/post/type-check-scalability-issue/) from the finders at RedHat, and I've also written up a basic description [in the micronaut-test documentation](https://micronaut-projects.github.io/micronaut-test/latest/guide/#typePollution). In jackson-databind, this issue can manifest when deserializing a `List<String>` property. The property will be deserialized as ArrayList. StringCollectionDeserializer will then cast this ArrayList to Collection, causing a type check. Later on, the BeanPropertyWriter will set this field (or call the constructor, or call the setter method), causing a second type check against the declared type of the property, List. This back-and-forth can trigger the JDK bug. Replacing the Collection cast by an ArrayList cast when possible fixes this issue. Attributing a production performance issue to type pollution is tricky, so the RedHat folks developed [a Java agent](https://github.com/RedHatPerf/type-pollution-agent) that instruments all the bytecode of an application, tracks type checks, and reports any potential type pollution. In micronaut-test, I've adapted this approach to make it workable in unit tests. micronaut-test-type-pollution does not require use of other parts of the micronaut framework. In this PR, on top of the actual fix, I've included a test based on micronaut-test-type-pollution. This test runs as a separate test execution because it does not play well with other agents (in particular jacoco). I use junit 5 test suites for this. This test protects us against regressions, since they can happen when modifying seemingly unrelated code, and also gives others the option to test more complex structures for similar issues. Another option is to just not test this, which would remove the need for the build changes.
@franz1981 PTAL |
Many thanks @yawkat !! ❤️ |
@yawkat I don't have access to a laptop this afternoon. Do you know if the JDK teams will backport their improvements? |
see @pjfanning https://mail.openjdk.org/pipermail/jdk-updates-dev/2024-October/038145.html (I've answered there) |
This PR gives me mixed feelings. It's good to see the research and issues found. For me, this is 100% a Java runtime bug. I don't think it is good for libs like jackson-databind to workaround it. Let users upgrade their Java runtimes when the fix is ready. We are now going to get people complaining that they want Jackson releases while they refuse to upgrade their Java runtimes. Not everyone can upgrade Jackson. Much of the Big Data world is stuck on Jackson 2.12 because Hadoop needs that version. I can try to root out the Maven stats but it's usually shocking to see how people stick to the old versions. Many of the companies most impacted by this have the financial resources to fork jackson-databind and add this change for themselves. |
@pjfanning I think the build parts in this PR are arguable. I agree it makes the build more complicated for a single issue. We can leave out the tests if necessary. But the performance issue itself is significant. I've not hit this jackson case in production (yet), but I've seen cases where there is a 20x (!) performance improvement from fixing type pollution issues. When it is a problem, it can really hit hard and be super difficult to track down. It's this difficulty of debugging why we don't see many reports of this. It likely affects many large scale deployments, but very few notice and can figure out why. |
Thanks @yawkat. I understand that there can be a performance drag. It is still true that this is an old issue that noone noticed. The fix is already in Java 23 and will likely be backfit to Java 21 and maybe other Java versions (openjdk/jdk21u-dev#1090). |
Quick note: I don't think this should go in But I do like testing aspects since it'd be hard to validate this problem. I'll take some more time to digest this before commenting any more. |
https://youtu.be/PxcO3WHqmng?si=QwhuG9Rfwipa1K-q this is to add some more material to help digesting it :) and yes, I agree with both. |
@cowtowncoder imo the actual change to the production code is minor enough for 2.18. The build change, maybe not so much. |
Agreeing with @pjfanning that this issue would open up doors to more wrapper types supporting same behavior. But then if JDK folks don't plan on backporting to even earlier versions like version 11, then I guess it's up to users (like us?) to choose to find ways to optimize things. And according to @franz1981 's email discussion, not-backporting seems to be the direction headed. |
It's actually not that common. In micronaut we found less than a dozen code sites that had to be modified so far. I don't think this patch would lead to many more similar changes throughout jackson. |
imo it's not really different from other performance optimizations. We optimize for characteristics of the JVM, or even the CPU, all the time. We build code that is CPU-pipelining-friendly for current CPUs even if future CPUs might improve the pipelining logic and those optimizations might not be necessary anymore. This PR is a fairly minor change in comparison. The fact that this is fixed in 23, and will probably be backported at least to 21, does not matter in my opinion. We support all the way until 8, and 8 will never receive a backport. The timeline for other LTS backports is uncertain at best. The only major difference to other performance issues is that this one is much more difficult to discover due to its relation to concurrent memory access. |
If we were to make the change, I'd prefer just a small patch without the build changes. Is there also a good reason not support some more List and Set types (LinkedList, TreeSet, etc)? I understand that more type workarounds means lower performance but do we have evidence that ArrayList and HashSet are so dominant that we can ignore the rest? |
The reason for ArrayList and HashSet is that they are the default types for List and Set deserialization. If you give a concrete type in your model like LinkedList, then the issue does not materialize, because there is no interface type check when the field is set (since the field type is not an interface). |
Quick note: I will be out of town until next tuesday so no updates here -- but I consider this an important fix and need to give it full attention when I come back. |
Ok: I'd be ready to merge this, but I think that it'd be best to split this in 2 PRs:
this would make it easier to see if and how to merge (2) in 3.0/master. Actually I can probably create separate subset PR first. |
So this PR moves to 2.19 right? Makes sense. |
…bind into collection-pollution
…maller diff, no need to duplicate
Ok: I have trimmed this, reorganized bits, and once I change to be based on 2.19 can merge. |
Thanks! |
This patch fixes a type pollution issue when deserializing a
List<String>
property, by adding explicit type checks for common subclasses (ArrayList, HashSet).The type pollution issue is a Hotspot performance problem prior to OpenJDK 23, where repeated type checks of the same concrete type against multiple interface types can lead to significant performance problems in multi-core systems. The issue is described in detail in this blog post from the finders at RedHat, and I've also written up a basic description in the micronaut-test documentation.
In jackson-databind, this issue can manifest when deserializing a
List<String>
property. The property will be deserialized as ArrayList. StringCollectionDeserializer will then cast this ArrayList to Collection, causing a type check. Later on, the BeanPropertyWriter will set this field (or call the constructor, or call the setter method), causing a second type check against the declared type of the property, List. This back-and-forth can trigger the JDK bug. Replacing the Collection cast by an ArrayList cast when possible fixes this issue.Attributing a production performance issue to type pollution is tricky, so the RedHat folks developed a Java agent that instruments all the bytecode of an application, tracks type checks, and reports any potential type pollution. In micronaut-test, I've adapted this approach to make it workable in unit tests. micronaut-test-type-pollution does not require use of other parts of the micronaut framework.
In this PR, on top of the actual fix, I've included a test based on micronaut-test-type-pollution. This test runs as a separate test execution because it does not play well with other agents (in particular jacoco). I use junit 5 test suites for this. This test protects us against regressions, since they can happen when modifying seemingly unrelated code, and also gives others the option to test more complex structures for similar issues.
Another option is to just not test this, which would remove the need for the build changes.