Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controlflow fixes #1425

Merged
merged 4 commits into from
Oct 8, 2023
Merged

controlflow fixes #1425

merged 4 commits into from
Oct 8, 2023

Conversation

tthsqe12
Copy link
Contributor

@tthsqe12 tthsqe12 commented Sep 15, 2023

test code from #1392

@tthsqe12
Copy link
Contributor Author

@wsmoses Could you say something about the viability of these changes before I start changing all of your doc tests?

@wsmoses
Copy link
Member

wsmoses commented Sep 17, 2023

Yeah, I'll take a look in the next few days

@wsmoses
Copy link
Member

wsmoses commented Sep 17, 2023

I inspected this more deeply, and at least one problem that this partially fixes is the fact that Enzyme (incorrectly) assumed that an exit block only had one loop predecessor. I fixed that assumption here: #1430 . That is effectively your change, however, the store limit part unfortunately had to be a lot more complex (see the PR).

I'm not sure that is the entirety of the issue you saw, but try that out and take a look?

@tthsqe12
Copy link
Contributor Author

Yes, there are two issues that the AB loops illuminate. Your PR does (1). What remains is (2) the antivars may be loaded uninitialized in the reverse pass. You can see these in the llvm in EnzymeLogic after enzyme has done its thing but before the post optimization passes are run. The promote pass from llvm is what obscures the bad loads. There is a function computeIndexOfChunk that loads from uninitialized antivars. This happens in ...'merge blocks that only initialize one of the antivars. This doesn't work because the whole tower of antivars needs initialization.

Try looking at the llvm (before post optimization) for the C code I posted.

@tthsqe12
Copy link
Contributor Author

tthsqe12 commented Sep 18, 2023

@wsmoses I see that you did have a try at (2) in #1430 for v0.0.84. This gets closer, but it is not there yet. Here is the problem very specifically:
Suppose we have loop C inside a loop B inside a loop A, with induction vars ivC, ivB, ivA and limit caches llcC, llcB, llcA.
The primal code now has a correct block that looks like:

return.loopexitC:
llcA = ivA
llcB[ivA] = ivB
llcC[ivA][ivB] = ivC
br return

Now, ivC'ac, ivB'ac, ivA'ac are uninitialized allocas to circumvent the SSA requirement. As for the reverse code:
What we see in v0.0.82:

invertreturn.loopexitC:
t1 = llcC[ivA'ac][ivB'ac]
ivC'ac = t1

This is doubly wrong because ivA'ac and ivB'ac are uninitialized when read, and ivA'ac and ivB'ac are not set. As a consequence of the first, llcC[ivA'ac][ivB'ac] reads from an undefined address.
What we see in v0.0.84:

invertreturn.loopexitC:
t3 = llcA
t2 = llcB[ivA'ac]
t1 = llcC[ivA'ac][ivB'ac]
ivA'ac = t3
ivB'ac = t2
ivC'ac = t1

This is getting closer but is still bad because ivA'ac and ivB'ac are uninitialized when used.
What we need to see:

invertreturn.loopexitC:
t3 = llcA
ivA'ac = t3
t2 = llcB[ivA'ac]
ivB'ac = t2
t1 = llcC[ivA'ac][ivB'ac]
ivC'ac = t1

Since I considered this an invasive change, in this PR I just put the initializations of ivA'ac, ivB'ac, and ivC'ac in the primal part.
Please decide where they best go.

@wsmoses
Copy link
Member

wsmoses commented Sep 19, 2023

Excellent catch, I believe #1443 should remedy what you found

@tthsqe12
Copy link
Contributor Author

Looks good. Would you like the test code here in the integration folder?

@wsmoses
Copy link
Member

wsmoses commented Sep 21, 2023

Yeah, the more the merrier. Rebase and let's merge?

@tthsqe12 tthsqe12 force-pushed the controlflowfixpart1try2 branch from 57f3e5e to 5bcc9e9 Compare October 1, 2023 01:50
@tthsqe12 tthsqe12 force-pushed the controlflowfixpart1try2 branch from 5bcc9e9 to dfe725a Compare October 1, 2023 01:51
@tthsqe12
Copy link
Contributor Author

tthsqe12 commented Oct 1, 2023

This has only the test code now. By the way, the following tests broke sometime between v0.0.70 and v.0.0.80

ReverseMode/fbuff.cpp -O0 -std=c++11 -fno-exceptions -ffast-math
ReverseMode/map.cpp -O0 -std=c++11 -fno-exceptions -ffast-math

@wsmoses wsmoses enabled auto-merge October 6, 2023 22:43
@wsmoses wsmoses added this pull request to the merge queue Oct 8, 2023
Merged via the queue into EnzymeAD:main with commit 80392fd Oct 8, 2023
30 of 53 checks passed
MilesCranmer pushed a commit to MilesCranmer/Enzyme that referenced this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants