Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync bugfix in JDK 8281631, about HashMap allocation capacity. #5965

Closed
wants to merge 2 commits into from

Conversation

XenoAmess
Copy link
Contributor

@google-cla
Copy link

google-cla bot commented Mar 16, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

@eamonnmcmanus
Copy link
Member

Thanks for taking the trouble to look into this!

I'm not sure we want to make this change, or at least not unconditionally. In the short to mid term, people will be running this code on JDK versions that don't have your change. So then Maps.newHashMapWithExpectedSize will be choosing an initialCapacity that will actually cause a resize if the expected size is one of the problematic values like 12, right? Whereas if we don't make this change and someone has a JDK version that does have your change, we'll sometimes choose an initialCapacity that leads to an unnecessarily-large HashMap. Is it worse to have a resize that we could have avoided or to have a too-large size? Bearing in mind that the size is only slightly too large, in that if the number of entries had been 1 more then we would have chosen that larger size anyway.

Perhaps we could check System.getProperty("java.specification.version") and do the new thing if it is ≥19?

@XenoAmess
Copy link
Contributor Author

XenoAmess commented Mar 19, 2022

@eamonnmcmanus

I wrote a test for you to make it clearer.

You can run the test I provided at the original version without my changes to Maps, on a jdk8, to see how it fails and why I want to fix it here.

So then Maps.newHashMapWithExpectedSize will be choosing an initialCapacity that will actually cause a resize if the expected size is one of the problematic values like 12, right?

Actually not.

Look at your original codes there, they are using the calculation in HashMap.putAll, which is buggy.

But the resize in openjdk is good. we only fixed the calculation in HashMap.putAl in 8281631.

So I think it should be fixed here too, as the former false calculation are from HashMap.putAl.

But the resize function is good, so it will not invoke a resize.

@eamonnmcmanus
Copy link
Member

eamonnmcmanus commented Mar 20, 2022

Thanks for the detailed explanation and the test!

Here's my understanding now. Let's look at one of the problematic sizes, like 12. Currently, Maps.newHashMapWithExpectedSize(12) will allocate a HashMap with new HashMap<>(17) and that will set threshold = 32 whether or not your JDK fix is present. The table array is not allocated immediately but only at the first write operation. Then, the resize() method will allocate an array of size 32.

With the change here, Maps.newHashMapWithExpectedSize(12) will do new HashMap<>(16) which will set threshold = 16. Then, if the first write operation is for example put, the resize() method will keep the size of 16 for the table array. If there are 11 other put operations (for a total of the expected 12) then the array will keep this size. If, instead, the first write operation is putAll(m) with all 12 expected entries in m, then a JDK without your fix will recompute threshold = 32 and that is the size that will be allocated. A JDK with your fix will preserve the size of 16. So, at worst, with this PR we will still end up with the array of size 32 that we would have had anyway without the PR, and usually we will have the better array size 16. I say "usually" because if you're going to be doing putAll(m) where m has all the values you want, then you might as well have done new HashMap<>(m).

The important thing I missed was the delayed allocation of table, which means that we will never see a smaller table being allocated and then reallocated as a larger one.

Because of the way syncing with our internal source-code management system works, we won't be merging this PR, but we'll make equivalent changes, duly credited. Thanks again!

@eamonnmcmanus eamonnmcmanus self-assigned this Mar 20, 2022
copybara-service bot pushed a commit that referenced this pull request Mar 21, 2022
[This JDK change](openjdk/jdk@3e39304) fixed the computation used when the first write to a `HashMap` is `putAll(otherMap)`. Previously that would occasionally lead to an unnecessarily large internal table. That same computation was copied into `Maps.newHashMapWithExpectedSize`, so it should also be adjusted.

Thanks to @XenoAmess for both the JDK fix and the fix here.

Closes #5965.

RELNOTES=`Maps.newHashMapWithExpectedSize` sometimes allocated maps that were larger than they needed to be.
PiperOrigin-RevId: 436318398
copybara-service bot pushed a commit that referenced this pull request Mar 22, 2022
[This JDK change](openjdk/jdk@3e39304) fixed the computation used when the first write to a `HashMap` is `putAll(otherMap)`. Previously that would occasionally lead to an unnecessarily large internal table. That same computation was copied into `Maps.newHashMapWithExpectedSize`, so it should also be adjusted.

Thanks to @XenoAmess for both the JDK fix and the fix here.

Closes #5965.

RELNOTES=`Maps.newHashMapWithExpectedSize` sometimes allocated maps that were larger than they needed to be.
PiperOrigin-RevId: 436318398
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants