Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow AccessibleByteArrayOutputStream to shrink #346

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

olivergillespie
Copy link

Note: this is a draft PR to get feedback on the idea in general. I can add tests etc. if the approach is approved. Let me know if you'd prefer me to create an issue to discuss.

AccessibleByteArrayOutputStream currently can only grow or stay the same size. This could be a problem if it is used for a rare large item, but then used only for small items - the buffer would stay large forever, consuming memory. Since an application may hold many of these buffers via their use particularly in Cipher, the memory impact could be significant.

This simple shrinkage algorithm shrinks the buffer if it has been 'too large' N times in a row. N starts at 1, and grows exponentially up to a limit of 1024 as re-growths are triggered. N starts low to enable reacting quickly, but as more growths are observed it increases to avoid churn. It is capped to avoid degrading into 'never-shrink' behaviour over the long term.

Other ideas were considered, like keeping track of the last N used sizes and making decisions based on those values, but this algorithm has a tiny footprint (2 ints, could be shorts) and low computation overhead.

Here are some example progressions (each column shows the state of the algorithm/buffer at that step, and the action if any taken after that input, e.g. g(128) means grow buffer to 128. s is shrink):

// 1 large, many small
       Input size:   100 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |
      Buffer size:     0 |  128 |  128 |  128 |  128 |   64 |   64 |   64 |   64 |   32 |   32 |   32 |   32 |   16 |
 Times over-sized:     0 |    0 |    1 |    2 |    3 |    0 |    1 |    2 |    3 |    0 |    1 |    2 |    3 |    0 |
        Threshold:     1 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |
          Action: g(128) |    - |    - |    - |s(64) |    - |    - |    - |s(32) |    - |    - |    - |s(16) |    - |
          
// large/small oscillation
       Input size:     1 |    2 |    4 |    8 |    2 |    8 |    2 |    8 |    2 |
      Buffer size:     0 |    1 |    2 |    4 |    8 |    8 |    8 |    8 |    8 |
 Times over-sized:     0 |    0 |    0 |    0 |    0 |    1 |    0 |    1 |    0 |
        Threshold:     1 |    2 |    4 |    8 |   16 |   16 |   16 |   16 |   16 |
           Action:  g(1) | g(2) | g(4) | g(8) |    - |    - |    - |    - |    - |
 
 // 6 small 1 large repeat
       Input size:     1 |    1 |    1 |    1 |    1 |    1 |    5 |    1 |    1 |    1 |    1 |    1 |    1 |    5 |    1 |    1 |    1 |    1 |    1 |    1 |    5 |    1 |    1 |    1 |    1 |    1 |    1 |    5 |
      Buffer size:     0 |    1 |    1 |    1 |    1 |    1 |    1 |    8 |    8 |    8 |    8 |    8 |    8 |    4 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |
 Times over-sized:     0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    1 |    2 |    3 |    4 |    5 |    0 |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    0 |    1 |    2 |    3 |    4 |    5 |    6 |
        Threshold:     1 |    2 |    2 |    2 |    2 |    2 |    2 |    4 |    4 |    4 |    4 |    4 |    4 |    4 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |
           Action:  g(1) |    - |    - |    - |    - |    - | g(8) |    - |    - |    - |    - |    - | s(4) | g(8) |    - |    - |    - |    - |    - |    - |    - |    - |    - |    - |    - |    - |    - |    - |

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

AccessibleByteArrayOutputStream currently can only grow
or stay the same size. This could be a problem if it is used for a rare
large item, but then used only for small items - the buffer would stay
large forever, consuming memory. Since an application may hold many of
these buffers via their use particularly in Cipher, the memory impact
could be significant.

This simple shrinkage algorithm shrinks the buffer if it has been 'too
large' N times in a row. N starts at 1, and grows exponentially up to a
limit of 1024 as re-growths are triggered. N starts low to enable
reacting quickly, but as more growths are observed it increases to avoid
churn. It is capped to avoid degrading into 'never-shrink' behaviour
over the long term.

Other ideas were considered, like keeping track of the last N used sizes
and making decisions based on those values, but this algorithm has a
tiny footprint (2 ints, could be shorts) and low computation overhead.
@olivergillespie olivergillespie requested a review from a team as a code owner November 3, 2023 15:32
Copy link
Contributor

@geedo0 geedo0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of supporting shrinkage. I think we should iterate more on the algorithm. Was there impetus behind your decision to look into this? If there was profiling data for us to examine that would help immensely in proving out the assumptions around how this data structure gets used in production.

Comment on lines 11 to 12
private int timesOversized;
private int timesOversizedThreshold;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the initial value of the threshold? These need to be initialized somewhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it would start at 0, which is fine by me, but also any small value would work (1, 2, 4).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually 0 is a mistake here, since I'm doubling it to grow it. It must have been set to 1 in my testing and somehow I messed it up when committing. I'll fix it if we decide to use it.

@@ -122,5 +137,8 @@ private void growCapacity(final int newCapacity, final boolean doNotAllocateMore
final byte[] toZeroize = buf;
buf = tmp;
Arrays.fill(toZeroize, 0, count, (byte) 0);

// Every time we need to grow, make it harder to shrink in the future (up to a limit).
timesOversizedThreshold = Math.min(MAX_OVERSIZED_THRESHOLD, timesOversized * 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following the logic here on how this threshold adapts to our workload. Seems like it only ever gets harder over time. That can make sense depending on the call pattern, but for something like ACCP we have to be extremely general purpose and assume nothing about those patterns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that it only gets harder over time. So really the only difference between workloads is how fast it tends to the limit, and how far it gets. My expectation is that some workloads will never reach it, some will reach it slowly and some will reach it so quickly that they might as well have started there. I agree it needs a bit more exploration, I'll add some thoughts to your top-level comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bug here too - timesOversized should be timesOversizedThreshold. I had that fixed locally, I must have missed committing my last few changes, sorry.

@olivergillespie
Copy link
Author

Thanks @geedo0 for looking at it.

Was there impetus behind your decision to look into this?

A service I work with switched to re-using their Cipher instances as ThreadLocals. They have a few thousand threads which all occasionally touch crypto, so they now have a few thousand Cipher instances. They saw memory issues because every one of these held onto a 1MB buffer. I haven't confirmed it, but from our knowledge of the service this 1MB buffer is only needed for the largest customer requests, and the typical payload is much smaller. Then I saw the 'TODO: implement shrinkage' in this file and figured why not.

If there was profiling data for us to examine that would help immensely in proving out the assumptions around how this data structure gets used in production.

I can look into collecting the payload size distribution in the service I mentioned.

Continuing on from the other comment on the algorithm itself - a simpler algorithm ('shrink1') would be a fixed threshold. So if a buffer is too large 1000 times in a row, we shrink it. That should handle the worst case for the status quo which is a single huge payload growing the buffer forever. I don't mind using this if we find it easier to understand and/or the benefits of the more complex algorithm are dubious.
My algorithm here ('shrink2') is only a small tweak on that, which I think improves responsiveness in some cases and tends to the behaviour of shrink1 in others. It's possible that it would be further improved with some way to limit or reverse threshold growth.

Copy link
Contributor

@amirhosv amirhosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like the idea. Having said that, since the "shrinking" policy is use-case dependent, it would be nice to introduce a system property where a user can control this behavior with.
Also, the shrinking policy may not have to be as sophisticated as the one proposed in this PR: if a team has metrics around the payload sizes, then the shrinking could be as simple as setting the buffer size to some fixed size.
Having a way to tune and control this behavior would be great. There might be customers who may be impacted once we change this approach and having a way to control it would be great.

@amirhosv
Copy link
Contributor

amirhosv commented Nov 8, 2023

Thanks @geedo0 for looking at it.

Was there impetus behind your decision to look into this?

A service I work with switched to re-using their Cipher instances as ThreadLocals. They have a few thousand threads which all occasionally touch crypto, so they now have a few thousand Cipher instances. They saw memory issues because every one of these held onto a 1MB buffer. I haven't confirmed it, but from our knowledge of the service this 1MB buffer is only needed for the largest customer requests, and the typical payload is much smaller. Then I saw the 'TODO: implement shrinkage' in this file and figured why not.

If there was profiling data for us to examine that would help immensely in proving out the assumptions around how this data structure gets used in production.

I can look into collecting the payload size distribution in the service I mentioned.

Continuing on from the other comment on the algorithm itself - a simpler algorithm ('shrink1') would be a fixed threshold. So if a buffer is too large 1000 times in a row, we shrink it. That should handle the worst case for the status quo which is a single huge payload growing the buffer forever. I don't mind using this if we find it easier to understand and/or the benefits of the more complex algorithm are dubious. My algorithm here ('shrink2') is only a small tweak on that, which I think improves responsiveness in some cases and tends to the behaviour of shrink1 in others. It's possible that it would be further improved with some way to limit or reverse threshold growth.

Laying down the necessary pluming so that we can add different shrinking policies would be awesome.

@olivergillespie
Copy link
Author

I have updated to allow alternative shrink strategies. I have not implemented anything user-selectable yet, though it should be easy to add. While I understand the desire to allow configuration, I think that by far the most important choice is the default - I expect almost no users to ever give this enough thought to want to change it.

@geedo0
Copy link
Contributor

geedo0 commented Nov 14, 2023

I like it so far and I'm warm to merging this in as long as we can all agree that it's a net positive with minimal risk of regression. I do have two hesitations though:

  1. The initial values of these parameters are undoubtedly not ideal and we have no means to dial them to the correct values once this goes in. We have no immediate metrics, benchmarks, or profiling data to measure improvement other than your use case.
  2. Regression risk for usage patterns that antagonize this shrink strategy and trigger some thrashing behavior that results in increased GC pressure from buffer shrinks.

That said, I think these two are small risks but a lack of imagination has been the root of many nasty bugs.

@olivergillespie
Copy link
Author

Thanks @geedo0. With a fixed threshold, naturally there is no 'ideal' value, but it's true that we could use more data.

It's worth noting that as N increases, we get closer and closer to the existing 'never shrink' strategy. This is partly why I did not include any possibility to reduce the threshold in the adaptive strategy - it's simple to reason about a threshold which never decreases and we can be sure of the worst case. A way to decrease the threshold can make the strategy more responsive, but maybe harder to analyze and may have a worse worst case. I suppose though with a way to reduce the threshold, we could also then allow a higher maximum threshold, since we know that it will adapt as necessary. The fixed value is a compromise between responsiveness and worst-case overhead. I could try a two-way adaptive policy and see how scrutable it is.

Risk of thrashing - yes, this is the key concern, the only extra cost users pay compared to before (except for the hopefully negligible logic overhead) is when we shrink and re-grow. The maximum rate this can happen, regardless of workload, is every N uses, where N is 1024 right now.
The worst case is a user who encrypts 1024 small payloads, followed by one large, on repeat. They will see a shrink and a grow for every large payload. Intuitively this feels like very reasonable amortised cost, but of course it does add gc pressure.

@geedo0
Copy link
Contributor

geedo0 commented Nov 15, 2023

It's times like this I wish we could add instrumentation to ACCP, but that's a heated topic and not on the table for the moment. I'll try to garner more input from the team as I don't want to act unilaterally here. In the meantime, if you want to get this closer to merge ready you'll need to run ./gradlew spotlessApply to get the CI passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants