Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Reduced garbage on InsightsReport's serialization #110

Closed

Conversation

franz1981
Copy link
Contributor

@franz1981 franz1981 commented Jun 15, 2023

Got a couple of questions before proceeding (adding tests and/or other changes).

It appears that in the multi client case each client serialize the same report many times, but they have the option (for each one) to decorate it:
There is a way to safely assume it won't be changed (by decoration?) and just serialize it once?

I refer to this

code path.
It would be nice to pass through the previously serialized report, during the iteration, if we an assume it to not be changed.

@franz1981
Copy link
Contributor Author

@ehsavoie too

@franz1981 franz1981 marked this pull request as ready for review June 28, 2023 16:29
@franz1981
Copy link
Contributor Author

franz1981 commented Jun 28, 2023

Another thing that could be fixed in the original code is to address the double copy (from heap ByteBuffer to direct one) hidden in the file writes, hitting https://github.com/openjdk/jdk/blob/9f98136c3a00ca24d59ffefd58308603b58110c7/src/java.base/share/classes/sun/nio/ch/IOUtil.java#L94

which cause unwanted and unpleasant creation of pooled (ie a proper leak) direct Buffer which would impact on EAP off-heap/direct memory footprint, see https://github.com/openjdk/jdk/blob/33011ea19bb29e88ce18a138a8fa8b34f8c97407/src/java.base/share/classes/sun/nio/ch/Util.java#L77 which show that by default, such temporary direct buffers are (thread-local) cached without any limits, per buffer.

In this regard, RandomAccessFile which can perform stack allocation or a temporary off-heap allocation (via new/free, really), seems a much better choice, to save any impact on long running applications like EAP: see https://github.com/openjdk/jdk/blob/33011ea19bb29e88ce18a138a8fa8b34f8c97407/src/java.base/share/native/libjava/io_util.c#L167
TLDR for reports which are > 8192, RandomAccessFile woudl create a new buffer (via malloc/free) to hold the array content and perform the actual file write (freeing up it immediately).
For reports smaller than that, instead, it would create a stack allocated buffer, without interacting with the native allocator.
@kittylyst wdyt?

@franz1981 franz1981 force-pushed the less_garbage_serialize branch 2 times, most recently from 65d8435 to f1d3230 Compare June 30, 2023 14:14
@kittylyst kittylyst self-requested a review June 30, 2023 14:29
@franz1981
Copy link
Contributor Author

franz1981 commented Jun 30, 2023

@kittylyst I've send an additional (optional) commit to address the comment at #110 (comment)

InsightsFileWritingClientTest::testWriteToTmp has become invalid, given that
it doesn't use serialize to produce an intermediate byte[].
What's the best way to test with mocks this ?

@kittylyst
Copy link
Collaborator

Hi folks. Thanks for the detailed work on this PR.

I am quite lost in all the comments and so forth, and as a result I'm not clear on the design intent here. Can we put a call in next week to discuss @franz1981 ? Also - can you link to the test cases that indicate the issue that we're trying to solve by this change?

@franz1981
Copy link
Contributor Author

franz1981 commented Jun 30, 2023

I am quite lost in all the comments and so forth, and as a result I'm not clear on the design intent here.

Yep, let me summarize the main points here:

  1. the current usage of Jackson is prone to creating multiple copies of the same serialized data, which could be improved by using the right API (which make uses of an internal byte[] own by Jackson itself): as a positive consequence of this, I have removed the need to encode again a String into its UTF-8 byte[] encoded form, saving others additional copy
  2. using Files::write hides (for OpenJDK, at least) a subtle off-heap direct leak due to using FileChannel::write which stores a thread local direct buffer to speed up subsequent writes performed by the same Thread. This doesn't work well with long running applications like Wildfly/EAP (and depends by the frequency by which reports are produced vs up time and user limits for direct memory allocations)
  3. The previous 2 changes has allowed me to use the Jackson API to directly stream the JSON over the RandomAccessFile provided, saving another additional copy and (String + byte[]) allocations.

We can go through these with some profiling data during the call plus the other topics that @ehsavoie want to talk about, thanks!

@franz1981 franz1981 force-pushed the less_garbage_serialize branch from 3b1fe4b to 430ead9 Compare July 3, 2023 12:44
@franz1981 franz1981 changed the title Reduced garbage on InsightsReport's serialization fix: Reduced garbage on InsightsReport's serialization Jul 3, 2023
@jponge
Copy link
Collaborator

jponge commented Jul 3, 2023

Nevermind the failure on the latest conventional commit checks, it's because the original PR didn't have a matching title (happened to me before)

@jponge
Copy link
Collaborator

jponge commented Sep 11, 2023

Superseded by #117

@jponge jponge closed this Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants