Allow AccessibleByteArrayOutputStream to shrink #346

olivergillespie · 2023-11-03T15:32:56Z

Note: this is a draft PR to get feedback on the idea in general. I can add tests etc. if the approach is approved. Let me know if you'd prefer me to create an issue to discuss.

AccessibleByteArrayOutputStream currently can only grow or stay the same size. This could be a problem if it is used for a rare large item, but then used only for small items - the buffer would stay large forever, consuming memory. Since an application may hold many of these buffers via their use particularly in Cipher, the memory impact could be significant.

This simple shrinkage algorithm shrinks the buffer if it has been 'too large' N times in a row. N starts at 1, and grows exponentially up to a limit of 1024 as re-growths are triggered. N starts low to enable reacting quickly, but as more growths are observed it increases to avoid churn. It is capped to avoid degrading into 'never-shrink' behaviour over the long term.

Other ideas were considered, like keeping track of the last N used sizes and making decisions based on those values, but this algorithm has a tiny footprint (2 ints, could be shorts) and low computation overhead.

Here are some example progressions (each column shows the state of the algorithm/buffer at that step, and the action if any taken after that input, e.g. g(128) means grow buffer to 128. s is shrink):

// 1 large, many small
       Input size:   100 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |    1 |
      Buffer size:     0 |  128 |  128 |  128 |  128 |   64 |   64 |   64 |   64 |   32 |   32 |   32 |   32 |   16 |
 Times over-sized:     0 |    0 |    1 |    2 |    3 |    0 |    1 |    2 |    3 |    0 |    1 |    2 |    3 |    0 |
        Threshold:     1 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |    2 |
          Action: g(128) |    - |    - |    - |s(64) |    - |    - |    - |s(32) |    - |    - |    - |s(16) |    - |
          
// large/small oscillation
       Input size:     1 |    2 |    4 |    8 |    2 |    8 |    2 |    8 |    2 |
      Buffer size:     0 |    1 |    2 |    4 |    8 |    8 |    8 |    8 |    8 |
 Times over-sized:     0 |    0 |    0 |    0 |    0 |    1 |    0 |    1 |    0 |
        Threshold:     1 |    2 |    4 |    8 |   16 |   16 |   16 |   16 |   16 |
           Action:  g(1) | g(2) | g(4) | g(8) |    - |    - |    - |    - |    - |
 
 // 6 small 1 large repeat
       Input size:     1 |    1 |    1 |    1 |    1 |    1 |    5 |    1 |    1 |    1 |    1 |    1 |    1 |    5 |    1 |    1 |    1 |    1 |    1 |    1 |    5 |    1 |    1 |    1 |    1 |    1 |    1 |    5 |
      Buffer size:     0 |    1 |    1 |    1 |    1 |    1 |    1 |    8 |    8 |    8 |    8 |    8 |    8 |    4 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |
 Times over-sized:     0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    1 |    2 |    3 |    4 |    5 |    0 |    0 |    1 |    2 |    3 |    4 |    5 |    6 |    0 |    1 |    2 |    3 |    4 |    5 |    6 |
        Threshold:     1 |    2 |    2 |    2 |    2 |    2 |    2 |    4 |    4 |    4 |    4 |    4 |    4 |    4 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |    8 |
           Action:  g(1) |    - |    - |    - |    - |    - | g(8) |    - |    - |    - |    - |    - | s(4) | g(8) |    - |    - |    - |    - |    - |    - |    - |    - |    - |    - |    - |    - |    - |    - |

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

AccessibleByteArrayOutputStream currently can only grow or stay the same size. This could be a problem if it is used for a rare large item, but then used only for small items - the buffer would stay large forever, consuming memory. Since an application may hold many of these buffers via their use particularly in Cipher, the memory impact could be significant. This simple shrinkage algorithm shrinks the buffer if it has been 'too large' N times in a row. N starts at 1, and grows exponentially up to a limit of 1024 as re-growths are triggered. N starts low to enable reacting quickly, but as more growths are observed it increases to avoid churn. It is capped to avoid degrading into 'never-shrink' behaviour over the long term. Other ideas were considered, like keeping track of the last N used sizes and making decisions based on those values, but this algorithm has a tiny footprint (2 ints, could be shorts) and low computation overhead.

geedo0

I like the idea of supporting shrinkage. I think we should iterate more on the algorithm. Was there impetus behind your decision to look into this? If there was profiling data for us to examine that would help immensely in proving out the assumptions around how this data structure gets used in production.

src/com/amazon/corretto/crypto/provider/AccessibleByteArrayOutputStream.java

geedo0 · 2023-11-07T20:10:15Z

src/com/amazon/corretto/crypto/provider/AccessibleByteArrayOutputStream.java

+  private int timesOversized;
+  private int timesOversizedThreshold;


What would be the initial value of the threshold? These need to be initialized somewhere.

Here it would start at 0, which is fine by me, but also any small value would work (1, 2, 4).

Actually 0 is a mistake here, since I'm doubling it to grow it. It must have been set to 1 in my testing and somehow I messed it up when committing. I'll fix it if we decide to use it.

geedo0 · 2023-11-07T20:15:22Z

src/com/amazon/corretto/crypto/provider/AccessibleByteArrayOutputStream.java

@@ -122,5 +137,8 @@ private void growCapacity(final int newCapacity, final boolean doNotAllocateMore
    final byte[] toZeroize = buf;
    buf = tmp;
    Arrays.fill(toZeroize, 0, count, (byte) 0);
+
+    // Every time we need to grow, make it harder to shrink in the future (up to a limit).
+    timesOversizedThreshold = Math.min(MAX_OVERSIZED_THRESHOLD, timesOversized * 2);


I'm not following the logic here on how this threshold adapts to our workload. Seems like it only ever gets harder over time. That can make sense depending on the call pattern, but for something like ACCP we have to be extremely general purpose and assume nothing about those patterns.

It's true that it only gets harder over time. So really the only difference between workloads is how fast it tends to the limit, and how far it gets. My expectation is that some workloads will never reach it, some will reach it slowly and some will reach it so quickly that they might as well have started there. I agree it needs a bit more exploration, I'll add some thoughts to your top-level comment.

There's a bug here too - timesOversized should be timesOversizedThreshold. I had that fixed locally, I must have missed committing my last few changes, sorry.

olivergillespie · 2023-11-08T10:46:04Z

Thanks @geedo0 for looking at it.

Was there impetus behind your decision to look into this?

A service I work with switched to re-using their Cipher instances as ThreadLocals. They have a few thousand threads which all occasionally touch crypto, so they now have a few thousand Cipher instances. They saw memory issues because every one of these held onto a 1MB buffer. I haven't confirmed it, but from our knowledge of the service this 1MB buffer is only needed for the largest customer requests, and the typical payload is much smaller. Then I saw the 'TODO: implement shrinkage' in this file and figured why not.

If there was profiling data for us to examine that would help immensely in proving out the assumptions around how this data structure gets used in production.

I can look into collecting the payload size distribution in the service I mentioned.

Continuing on from the other comment on the algorithm itself - a simpler algorithm ('shrink1') would be a fixed threshold. So if a buffer is too large 1000 times in a row, we shrink it. That should handle the worst case for the status quo which is a single huge payload growing the buffer forever. I don't mind using this if we find it easier to understand and/or the benefits of the more complex algorithm are dubious.
My algorithm here ('shrink2') is only a small tweak on that, which I think improves responsiveness in some cases and tends to the behaviour of shrink1 in others. It's possible that it would be further improved with some way to limit or reverse threshold growth.

amirhosv

Overall, I like the idea. Having said that, since the "shrinking" policy is use-case dependent, it would be nice to introduce a system property where a user can control this behavior with.
Also, the shrinking policy may not have to be as sophisticated as the one proposed in this PR: if a team has metrics around the payload sizes, then the shrinking could be as simple as setting the buffer size to some fixed size.
Having a way to tune and control this behavior would be great. There might be customers who may be impacted once we change this approach and having a way to control it would be great.

amirhosv · 2023-11-08T18:15:14Z

Thanks @geedo0 for looking at it.

Was there impetus behind your decision to look into this?

A service I work with switched to re-using their Cipher instances as ThreadLocals. They have a few thousand threads which all occasionally touch crypto, so they now have a few thousand Cipher instances. They saw memory issues because every one of these held onto a 1MB buffer. I haven't confirmed it, but from our knowledge of the service this 1MB buffer is only needed for the largest customer requests, and the typical payload is much smaller. Then I saw the 'TODO: implement shrinkage' in this file and figured why not.

If there was profiling data for us to examine that would help immensely in proving out the assumptions around how this data structure gets used in production.

I can look into collecting the payload size distribution in the service I mentioned.

Continuing on from the other comment on the algorithm itself - a simpler algorithm ('shrink1') would be a fixed threshold. So if a buffer is too large 1000 times in a row, we shrink it. That should handle the worst case for the status quo which is a single huge payload growing the buffer forever. I don't mind using this if we find it easier to understand and/or the benefits of the more complex algorithm are dubious. My algorithm here ('shrink2') is only a small tweak on that, which I think improves responsiveness in some cases and tends to the behaviour of shrink1 in others. It's possible that it would be further improved with some way to limit or reverse threshold growth.

Laying down the necessary pluming so that we can add different shrinking policies would be awesome.

olivergillespie · 2023-11-09T11:23:54Z

I have updated to allow alternative shrink strategies. I have not implemented anything user-selectable yet, though it should be easy to add. While I understand the desire to allow configuration, I think that by far the most important choice is the default - I expect almost no users to ever give this enough thought to want to change it.

geedo0 · 2023-11-14T19:36:07Z

I like it so far and I'm warm to merging this in as long as we can all agree that it's a net positive with minimal risk of regression. I do have two hesitations though:

The initial values of these parameters are undoubtedly not ideal and we have no means to dial them to the correct values once this goes in. We have no immediate metrics, benchmarks, or profiling data to measure improvement other than your use case.
Regression risk for usage patterns that antagonize this shrink strategy and trigger some thrashing behavior that results in increased GC pressure from buffer shrinks.

That said, I think these two are small risks but a lack of imagination has been the root of many nasty bugs.

olivergillespie · 2023-11-15T11:22:35Z

Thanks @geedo0. With a fixed threshold, naturally there is no 'ideal' value, but it's true that we could use more data.

It's worth noting that as N increases, we get closer and closer to the existing 'never shrink' strategy. This is partly why I did not include any possibility to reduce the threshold in the adaptive strategy - it's simple to reason about a threshold which never decreases and we can be sure of the worst case. A way to decrease the threshold can make the strategy more responsive, but maybe harder to analyze and may have a worse worst case. I suppose though with a way to reduce the threshold, we could also then allow a higher maximum threshold, since we know that it will adapt as necessary. The fixed value is a compromise between responsiveness and worst-case overhead. I could try a two-way adaptive policy and see how scrutable it is.

Risk of thrashing - yes, this is the key concern, the only extra cost users pay compared to before (except for the hopefully negligible logic overhead) is when we shrink and re-grow. The maximum rate this can happen, regardless of workload, is every N uses, where N is 1024 right now.
The worst case is a user who encrypts 1024 small payloads, followed by one large, on repeat. They will see a shrink and a grow for every large payload. Intuitively this feels like very reasonable amortised cost, but of course it does add gc pressure.

geedo0 · 2023-11-15T14:08:38Z

It's times like this I wish we could add instrumentation to ACCP, but that's a heated topic and not on the table for the moment. I'll try to garner more input from the team as I don't want to act unilaterally here. In the meantime, if you want to get this closer to merge ready you'll need to run ./gradlew spotlessApply to get the CI passing.

olivergillespie requested a review from a team as a code owner November 3, 2023 15:32

geedo0 requested changes Nov 7, 2023

View reviewed changes

amirhosv reviewed Nov 8, 2023

View reviewed changes

Allow different buffer shrink strategy implementations

d23cfe4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow AccessibleByteArrayOutputStream to shrink #346

Allow AccessibleByteArrayOutputStream to shrink #346

olivergillespie commented Nov 3, 2023

geedo0 left a comment

geedo0 Nov 7, 2023

olivergillespie Nov 8, 2023

olivergillespie Nov 9, 2023

geedo0 Nov 7, 2023

olivergillespie Nov 8, 2023

olivergillespie Nov 9, 2023

olivergillespie commented Nov 8, 2023

amirhosv left a comment

amirhosv commented Nov 8, 2023

olivergillespie commented Nov 9, 2023

geedo0 commented Nov 14, 2023

olivergillespie commented Nov 15, 2023

geedo0 commented Nov 15, 2023

		private int timesOversized;
		private int timesOversizedThreshold;

Allow AccessibleByteArrayOutputStream to shrink #346

Are you sure you want to change the base?

Allow AccessibleByteArrayOutputStream to shrink #346

Conversation

olivergillespie commented Nov 3, 2023

geedo0 left a comment

Choose a reason for hiding this comment

geedo0 Nov 7, 2023

Choose a reason for hiding this comment

olivergillespie Nov 8, 2023

Choose a reason for hiding this comment

olivergillespie Nov 9, 2023

Choose a reason for hiding this comment

geedo0 Nov 7, 2023

Choose a reason for hiding this comment

olivergillespie Nov 8, 2023

Choose a reason for hiding this comment

olivergillespie Nov 9, 2023

Choose a reason for hiding this comment

olivergillespie commented Nov 8, 2023

amirhosv left a comment

Choose a reason for hiding this comment

amirhosv commented Nov 8, 2023

olivergillespie commented Nov 9, 2023

geedo0 commented Nov 14, 2023

olivergillespie commented Nov 15, 2023

geedo0 commented Nov 15, 2023