Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store store barrier at the end of initializer of an object without final fields #1504

Closed
armLeiJin opened this issue Jul 17, 2019 · 13 comments
Closed
Assignees

Comments

@armLeiJin
Copy link

armLeiJin commented Jul 17, 2019

We find when graal news an object without final field, it produces one store store barrier at the end of the object initializer.

According to the Java language spec, thread which only can observe a object field after the object construction finishes, also has chance to observe a default value of it if the field is no final. And the barrier inserted at the end of object initializer is not required. We compare the assembely codes on AArch64 produced by graal and c2 seperately and find c2 doesn't have such a barrier.

Java source code

public class Node {
private double item;

 Node(double item) {
     this.item = Math.sqrt(item);
 }

}

private static Node createSqrt(double data) {
double input = data * (data - 3);
return new Node(input);
}

C2 assembly on AArch64
0x0000ffff851cb674: mov x10, #0x7c78 // #31864
0x0000ffff851cb678: movk x10, #0x9, lsl #16
0x0000ffff851cb67c: movk x10, #0x8, lsl #32
0x0000ffff851cb680: str x11, [x28,#320]
0x0000ffff851cb684: ldr x10, [x10,#184]
0x0000ffff851cb688: str x10, [x0]
0x0000ffff851cb68c: mov x10, #0x90000 // #589824
0x0000ffff851cb690: movk x10, #0x7c78
0x0000ffff851cb694: stp w10, wzr, [x0,#8]
0x0000ffff851cb698: prfm pstl1keep, [x11,#192]
0x0000ffff851cb69c: str xzr, [x0,#16]
0x0000ffff851cb6a0: dmb ishst
0x0000ffff851cb6a4: fsqrt d16, d16
0x0000ffff851cb6a8: str d16, [x0,#16]
0x0000ffff851cb6ac: ldp x29, x30, [sp,#32]
0x0000ffff851cb6b0: add sp, sp, #0x30
0x0000ffff851cb6b4: ldr x8, [x28,#296]
0x0000ffff851cb6b8: ldr wzr, [x8]
0x0000ffff851cb6bc: ret

Graal assembly on AArch64
0x0000ffff741c6c74: str x2, [x28,#320]
0x0000ffff741c6c78: prfm pstl1keep, [x0,#216]
0x0000ffff741c6c7c: ldr x1, [x3,#184]
0x0000ffff741c6c80: str x1, [x0]
0x0000ffff741c6c84: mov x1, #0xd0000 // #851968
0x0000ffff741c6c88: movk x1, #0xbad0
0x0000ffff741c6c8c: str w1, [x0,#8]
0x0000ffff741c6c90: str wzr, [x0,#12]
0x0000ffff741c6c94: str xzr, [x0,#16]
0x0000ffff741c6c98: dmb ishst
0x0000ffff741c6c9c: fmov d1, #3.000000000000000000e+00
0x0000ffff741c6ca0: fsub d1, d0, d1
0x0000ffff741c6ca4: fmul d0, d1, d0
0x0000ffff741c6ca8: fsqrt d0, d0
0x0000ffff741c6cac: str d0, [x0,#16]
0x0000ffff741c6cb0: dmb ishst
0x0000ffff741c6cb4: ldp x29, x30, [sp,#32]
0x0000ffff741c6cb8: add sp, sp, #0x30
0x0000ffff741c6cbc: ldr x8, [x28,#296]
0x0000ffff741c6cc0: ldr wzr, [x8]
0x0000ffff741c6cc4: ret

We also check the implementation in Graal and find the such an additional barrier is inserted when lower the commitAllocationNode.

/**
 * Insert the required {@link MemoryBarriers#STORE_STORE} barrier for an allocation and also
 * include the {@link MemoryBarriers#LOAD_STORE} required for final fields if any final fields
 * are being written, as if {@link FinalFieldBarrierNode} were emitted.
 */
private static void insertAllocationBarrier(CommitAllocationNode commit, StructuredGraph graph) {
    int barrier = MemoryBarriers.STORE_STORE;
    outer: for (VirtualObjectNode vobj : commit.getVirtualObjects()) {
        for (ResolvedJavaField field : vobj.type().getInstanceFields(true)) {
            if (field.isFinal()) {
                barrier = barrier | MemoryBarriers.LOAD_STORE;
                break outer;
            }
        }
    }
    graph.addAfterFixed(commit, graph.add(new MembarNode(barrier, LocationIdentity.init())));
}

According to the codes above, a store store barrier is inserted as a default logic and the barrier will be upgraded into a release one if the object has a final field. We would like to know why a store store barrier is demanded here if no final field? Can we remove it and keep release barrier logic only? Thanks!

@plokhotnyuk
Copy link

Why do we need release fences in constructors of instances of data structures which are not concurrent by definition?

Why all these fences cannot be moved to places which work with concurrency immediately: concurrent GC phases, static field initializers, concurrent data structures and APIs like futures, actors, etc.?

@adinn
Copy link
Collaborator

adinn commented Jul 17, 2019

HI Andriy,

Why do we need release fences in constructors of instances of data structures which are not concurrent by definition?

Why all these fences cannot be moved to places which work with concurrency immediately: concurrent GC phases, static field initializers, concurrent data structures and APIs like futures, actors, etc.?

This comment and the link to the Scala bug seem to have nothing to do with the problem being reported. Can you explain why you think these two problems are related?

@plokhotnyuk
Copy link

plokhotnyuk commented Jul 17, 2019

@adinn my questions are related to the implementation and usage of the insertAllocationBarrier method in GraalVM. And, I have already a short answer on all of them from @shipilev in the mechanical-sympathy google group. Also, he shared a link on the related JDK issue

Scala collections started to use a lot of releaseFence calls from 2.13.0 version too. For some isolated cases when only constructors of linked list nodes are called the combination GraalVM CE/EE 19.1 + Scala 2.13.0 shows great slowdown which is illustrated by results and reports of different benchmarks in the related Scala issue.

@adinn
Copy link
Collaborator

adinn commented Jul 17, 2019

Hi Andriy,

Ok, I am just concerned to distinguish the different issues involved here and, in consequence, decouple the required solutions. The problem reported here affects compilation of Java and Scala code (and perhaps other languages hosted via Truffle). The one you linked to appears to be a Scala-only problem which requires a Scala-only fix. So, although this current problem may exacerbate the performance hit that the Scala implies it doesn't look to me like there is any other important connection between them that will influence any proposed fix. Is that correct?

I think the OpenJDK issue you linked to is also referring to a different problem to this one. This is about the redundancy of the extra StoreStore barriers written for each non-final field when there is already a StoreStore following the header section write. The OpenJDK issue concerns the redundancy of the header section StoreStore when a trailing LoadStore+StoreStore is inserted because of the presence of a final field. So, the two cases also appear to be distinct.

What might be relevant is another OpenJDK issue linked to the one you mentioned which notes how multiple StoreStore barriers accumulate when there are multiple chained constructors. Doug Lea's comments about the legitimacy of omitting barriers might be relevant ot whatever fix is proposed for this issue.

@armLeiJin
Copy link
Author

armLeiJin commented Jul 17, 2019

Hi Andriy,

Ok, I am just concerned to distinguish the different issues involved here and, in consequence, decouple the required solutions. The problem reported here affects compilation of Java and Scala code (and perhaps other languages hosted via Truffle). The one you linked to appears to be a Scala-only problem which requires a Scala-only fix. So, although this current problem may exacerbate the performance hit that the Scala implies it doesn't look to me like there is any other important connection between them that will influence any proposed fix. Is that correct?

I think the OpenJDK issue you linked to is also referring to a different problem to this one. This is about the redundancy of the extra StoreStore barriers written for each non-final field when there is already a StoreStore following the header section write. The OpenJDK issue concerns the redundancy of the header section StoreStore when a trailing LoadStore+StoreStore is inserted because of the presence of a final field. So, the two cases also appear to be distinct.

What might be relevant is another OpenJDK issue linked to the one you mentioned which notes how multiple StoreStore barriers accumulate when there are multiple chained constructors. Doug Lea's comments about the legitimacy of omitting barriers might be relevant ot whatever fix is proposed for this issue.

Thanks for providing more related information for the issue.
Actually, we find current C2 already removes the store store barrier guarding a object (with final field) header writing at the so-called "thread local" case. Am I right? Graal hasn't perform such an optimization yet.
https://bugs.openjdk.java.net/browse/JDK-8032481 still opens. Does it means OpenJDK will continue to optimize the membar logic here?

@adinn
Copy link
Collaborator

adinn commented Jul 17, 2019

Actually, we find current C2 already removes the store store barrier guarding a object (with final field) header writing at the so-called "thread local" case. Am I right? Graal hasn't performance such an optimization yet.
https://bugs.openjdk.java.net/browse/JDK-8032481 still opens. Does it means OpenJDK will continue to optimize the membar logic here?

Yes, C2 does already remove some of the redundant barriers. The graph generation for an initializer method includes a couple of potential optimizations. The two issues mentioned above ares still open so they may be addressed in future but I don't know of any plans to do that soon.

@tkrodriguez
Copy link
Member

This doesn't have anything to do with final field barriers except that it's a bug introduced by the changes to add the proper barriers for final fields. The problem is that we currently emit a STORE_STORE barrier both in NewObjectSnippets, which is used for the lowering of AbstractNewObjectNodes, and in the lowering of CommitAllocationNodes, which represents a group of allocations. CommitAllocationNodes are produced by PEA when it picks up stores on newly allocated objects as part of escape analysis. We just need to keep better track of whether the membar has been emitted or not. For group allocations we only need to emit a single final barrier since the whole group of objects isn't visible until after the barrier. In the graal assembly above this would mean that the dmb at 0x0000ffff741c6cb0 would be the only one emitted.

@tkrodriguez
Copy link
Member

I have a branch with a fix for this but I"m debugging a gate problem. I'll put it up as soon as that's resolved.

@armLeiJin
Copy link
Author

I have a branch with a fix for this but I"m debugging a gate problem. I'll put it up as soon as that's resolved.

Do you mean the issue can be fixed by just removing the store store barrier in function formatObject? Please correct me if I make mistakes.

Actually we also find another 2 cases that graal produces one barrier than C2.

  • One is simliar to https://bugs.openjdk.java.net/browse/JDK-8032481. C2 has removed store-store barrier after object memory zeroing in some cases with object has final fields while graal not.
  • Another is when object whose intializer is not inlined has a final field, release membars from commitAllocationNode and FinalFeildBarrierNode will both be produced in assembly codes. At this time, in the function which new the object operation is called will have two sequently barrier, store-store and release.

I think the solution you proposed above can solve the first issue while improve the second one by removing first store-store barrier (actually we need store-store here and need to remove release one).

If it is correct, we will also make some tests on AArch64 for your solution to verify the behaviours.

@adinn
Copy link
Collaborator

adinn commented Jul 18, 2019

This doesn't have anything to do with final field barriers except that it's a bug introduced by the changes to add the proper barriers for final fields.

Exactly, this is just about unnecessary STORE_STORE barriers.

I have a branch with a fix for this but I"m debugging a gate problem.

Good to know. Of course, there is no rush to fix this as it only implies a small performance hit. Thanks for looking at this, Tom.

@tkrodriguez
Copy link
Member

I've put up the PR and it's been reviewed internally but I'm putting it up here in case you have comments.

@armLeiJin
Copy link
Author

I've put up the PR and it's been reviewed internally but I'm putting it up here in case you have comments.

Thank you. I have reviewed it and now I am conducting the tests on the another 2 issues I mentioned above. I will comment here once I finish the tests.

@armLeiJin
Copy link
Author

I've put up the PR and it's been reviewed internally but I'm putting it up here in case you have comments.

Thank you. I have reviewed it and now I am conducting the tests on the another 2 issues I mentioned above. I will comment here once I finish the tests.

I have tested the patch. It is good to me. I have no more questions on the issue. I close it now. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants