Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid type pollution in StringCollectionDeserializer #4848

Merged
merged 14 commits into from
Dec 21, 2024

Conversation

yawkat
Copy link
Member

@yawkat yawkat commented Dec 10, 2024

This patch fixes a type pollution issue when deserializing a List<String> property, by adding explicit type checks for common subclasses (ArrayList, HashSet).

The type pollution issue is a Hotspot performance problem prior to OpenJDK 23, where repeated type checks of the same concrete type against multiple interface types can lead to significant performance problems in multi-core systems. The issue is described in detail in this blog post from the finders at RedHat, and I've also written up a basic description in the micronaut-test documentation.

In jackson-databind, this issue can manifest when deserializing a List<String> property. The property will be deserialized as ArrayList. StringCollectionDeserializer will then cast this ArrayList to Collection, causing a type check. Later on, the BeanPropertyWriter will set this field (or call the constructor, or call the setter method), causing a second type check against the declared type of the property, List. This back-and-forth can trigger the JDK bug. Replacing the Collection cast by an ArrayList cast when possible fixes this issue.

Attributing a production performance issue to type pollution is tricky, so the RedHat folks developed a Java agent that instruments all the bytecode of an application, tracks type checks, and reports any potential type pollution. In micronaut-test, I've adapted this approach to make it workable in unit tests. micronaut-test-type-pollution does not require use of other parts of the micronaut framework.

In this PR, on top of the actual fix, I've included a test based on micronaut-test-type-pollution. This test runs as a separate test execution because it does not play well with other agents (in particular jacoco). I use junit 5 test suites for this. This test protects us against regressions, since they can happen when modifying seemingly unrelated code, and also gives others the option to test more complex structures for similar issues.

Another option is to just not test this, which would remove the need for the build changes.

This patch fixes a type pollution issue when deserializing a `List<String>` property, by adding explicit type checks for common subclasses (ArrayList, HashSet).

The type pollution issue is a Hotspot performance problem prior to OpenJDK 23, where repeated type checks of the same concrete type against multiple interface types can lead to significant performance problems in multi-core systems. The issue is described in detail [in this blog post](https://redhatperf.github.io/post/type-check-scalability-issue/) from the finders at RedHat, and I've also written up a basic description [in the micronaut-test documentation](https://micronaut-projects.github.io/micronaut-test/latest/guide/#typePollution).

In jackson-databind, this issue can manifest when deserializing a `List<String>` property. The property will be deserialized as ArrayList. StringCollectionDeserializer will then cast this ArrayList to Collection, causing a type check. Later on, the BeanPropertyWriter will set this field (or call the constructor, or call the setter method), causing a second type check against the declared type of the property, List. This back-and-forth can trigger the JDK bug. Replacing the Collection cast by an ArrayList cast when possible fixes this issue.

Attributing a production performance issue to type pollution is tricky, so the RedHat folks developed [a Java agent](https://github.com/RedHatPerf/type-pollution-agent) that instruments all the bytecode of an application, tracks type checks, and reports any potential type pollution. In micronaut-test, I've adapted this approach to make it workable in unit tests. micronaut-test-type-pollution does not require use of other parts of the micronaut framework.

In this PR, on top of the actual fix, I've included a test based on micronaut-test-type-pollution. This test runs as a separate test execution because it does not play well with other agents (in particular jacoco). I use junit 5 test suites for this. This test protects us against regressions, since they can happen when modifying seemingly unrelated code, and also gives others the option to test more complex structures for similar issues.

Another option is to just not test this, which would remove the need for the build changes.
@yawkat
Copy link
Member Author

yawkat commented Dec 10, 2024

@franz1981 PTAL

@franz1981
Copy link

Many thanks @yawkat !! ❤️

@pjfanning
Copy link
Member

@yawkat I don't have access to a laptop this afternoon. Do you know if the JDK teams will backport their improvements?

@franz1981
Copy link

see @pjfanning https://mail.openjdk.org/pipermail/jdk-updates-dev/2024-October/038145.html (I've answered there)

@pjfanning
Copy link
Member

pjfanning commented Dec 10, 2024

This PR gives me mixed feelings. It's good to see the research and issues found. For me, this is 100% a Java runtime bug. I don't think it is good for libs like jackson-databind to workaround it. Let users upgrade their Java runtimes when the fix is ready. We are now going to get people complaining that they want Jackson releases while they refuse to upgrade their Java runtimes.
This change complicates the Jackson build, its code and its tests. This issue is so important that it has affected everyone for multiple years and they never noticed.
Merge this and we'll have people come up with cases where they need TreeSet special case added or LinkedList, Vector, Stack, etc. Maybe j.u.Optional deserializer is affected. The issue could affect any deserializer that handles W<T> where W is a wrapper type like a Collection or some other Monad-like type.

Not everyone can upgrade Jackson. Much of the Big Data world is stuck on Jackson 2.12 because Hadoop needs that version. I can try to root out the Maven stats but it's usually shocking to see how people stick to the old versions. Many of the companies most impacted by this have the financial resources to fork jackson-databind and add this change for themselves.

@yawkat
Copy link
Member Author

yawkat commented Dec 10, 2024

@pjfanning I think the build parts in this PR are arguable. I agree it makes the build more complicated for a single issue. We can leave out the tests if necessary.

But the performance issue itself is significant. I've not hit this jackson case in production (yet), but I've seen cases where there is a 20x (!) performance improvement from fixing type pollution issues. When it is a problem, it can really hit hard and be super difficult to track down.

It's this difficulty of debugging why we don't see many reports of this. It likely affects many large scale deployments, but very few notice and can figure out why.

@pjfanning
Copy link
Member

pjfanning commented Dec 10, 2024

@pjfanning I think the build parts in this PR are arguable. I agree it makes the build more complicated for a single issue. We can leave out the tests if necessary.

But the performance issue itself is significant. I've not hit this jackson case in production (yet), but I've seen cases where there is a 20x (!) performance improvement from fixing type pollution issues. When it is a problem, it can really hit hard and be super difficult to track down.

It's this difficulty of debugging why we don't see many reports of this. It likely affects many large scale deployments, but very few notice and can figure out why.

Thanks @yawkat. I understand that there can be a performance drag. It is still true that this is an old issue that noone noticed. The fix is already in Java 23 and will likely be backfit to Java 21 and maybe other Java versions (openjdk/jdk21u-dev#1090).

@cowtowncoder
Copy link
Member

cowtowncoder commented Dec 10, 2024

Quick note: I don't think this should go in 2.18 branch but 2.19. But my first instinct is same as @pjfanning's .... whoa, is all this really necessary?

But I do like testing aspects since it'd be hard to validate this problem.

I'll take some more time to digest this before commenting any more.

@franz1981
Copy link

https://youtu.be/PxcO3WHqmng?si=QwhuG9Rfwipa1K-q this is to add some more material to help digesting it :) and yes, I agree with both.
It shouldn't happen, and should be fixed, but still..is relevant enough based on the minimum baseline JDK version of users, that cannot be ignored.
In hibernate we actually had a 2X on a row improvement in a test we were thinking to be I/O bound, because of this...and in Netty, 3X, in http encoding paths...

@yawkat
Copy link
Member Author

yawkat commented Dec 11, 2024

@cowtowncoder imo the actual change to the production code is minor enough for 2.18. The build change, maybe not so much.

@pjfanning pjfanning mentioned this pull request Dec 12, 2024
@JooHyukKim
Copy link
Member

Agreeing with @pjfanning that this issue would open up doors to more wrapper types supporting same behavior.

But then if JDK folks don't plan on backporting to even earlier versions like version 11, then I guess it's up to users (like us?) to choose to find ways to optimize things. And according to @franz1981 's email discussion, not-backporting seems to be the direction headed.

@yawkat
Copy link
Member Author

yawkat commented Dec 12, 2024

It's actually not that common. In micronaut we found less than a dozen code sites that had to be modified so far. I don't think this patch would lead to many more similar changes throughout jackson.

@yawkat
Copy link
Member Author

yawkat commented Dec 12, 2024

imo it's not really different from other performance optimizations. We optimize for characteristics of the JVM, or even the CPU, all the time. We build code that is CPU-pipelining-friendly for current CPUs even if future CPUs might improve the pipelining logic and those optimizations might not be necessary anymore. This PR is a fairly minor change in comparison.

The fact that this is fixed in 23, and will probably be backported at least to 21, does not matter in my opinion. We support all the way until 8, and 8 will never receive a backport. The timeline for other LTS backports is uncertain at best.

The only major difference to other performance issues is that this one is much more difficult to discover due to its relation to concurrent memory access.

@pjfanning
Copy link
Member

It's actually not that common. In micronaut we found less than a dozen code sites that had to be modified so far. I don't think this patch would lead to many more similar changes throughout jackson.

If we were to make the change, I'd prefer just a small patch without the build changes.

Is there also a good reason not support some more List and Set types (LinkedList, TreeSet, etc)? I understand that more type workarounds means lower performance but do we have evidence that ArrayList and HashSet are so dominant that we can ignore the rest?

@yawkat
Copy link
Member Author

yawkat commented Dec 12, 2024

The reason for ArrayList and HashSet is that they are the default types for List and Set deserialization. If you give a concrete type in your model like LinkedList, then the issue does not materialize, because there is no interface type check when the field is set (since the field type is not an interface).

@cowtowncoder
Copy link
Member

Quick note: I will be out of town until next tuesday so no updates here -- but I consider this an important fix and need to give it full attention when I come back.

@cowtowncoder
Copy link
Member

Ok: I'd be ready to merge this, but I think that it'd be best to split this in 2 PRs:

  1. Actual fix (in StringCollectionDeserializer) for 2.18 branch
  2. Build changes in 2.19 (I think there is some value in testing)

this would make it easier to see if and how to merge (2) in 3.0/master.

Actually I can probably create separate subset PR first.

@JooHyukKim
Copy link
Member

JooHyukKim commented Dec 21, 2024

So this PR moves to 2.19 right? Makes sense.

@cowtowncoder
Copy link
Member

Ok: I have trimmed this, reorganized bits, and once I change to be based on 2.19 can merge.

@cowtowncoder cowtowncoder changed the base branch from 2.18 to 2.19 December 21, 2024 04:06
@cowtowncoder cowtowncoder merged commit 628587e into FasterXML:2.19 Dec 21, 2024
8 checks passed
@yawkat
Copy link
Member Author

yawkat commented Dec 30, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants