Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for x86_fp80 (long double) #1604

Merged
merged 6 commits into from
Jan 4, 2024

Conversation

maxaehle
Copy link
Contributor

@maxaehle maxaehle commented Jan 4, 2024

This is an attempt to increase Enzyme's coverage of the LLVM type x86_fp80, in response to #1600. I'm not at all familiar with Enzyme from a developer perspective, and very happy about any kind of input.

Existing coverage

Some x86_fp80 coverage is already there, e.g. because the differentiation of elementary operations and floating-point conversions is defined independent of the type (if my understanding is correct).

Hence, the new division and sine testcases suggested in this PR pass from the beginning.

Missing coverage

A few locations in the code where x86_fp80 support is missing can be identified by searching for isDoubleTy and isFloatTy in the code:

  • Handling of bit-tricks (where real arithmetic is performed using e.g. bitwise logical operations), e.g. in AdjointGenerator.h and TypeAnalysis.cpp.
  • A particular case distinction that appears four times in TypeTree.h. At one of them the 80-bit case is already considered, at the other three it is not.
  • There are some more places, related to e.g. conversion from floating-point type to integer type of the same size, obtaining the type as a string, etc.

I'm not sure whether additional work is necessary at other places. Please post if you are aware of some. Eventually, tests should also tell us.

in the forward and reverse modes:
- The testcases for division and sine already pass.
- In the "fp80" testcases adapted from EnzymeAD#1600, the opt call fails.
@maxaehle
Copy link
Contributor Author

maxaehle commented Jan 4, 2024

Commit b01d87f adds x86_fp80 support in TypeTree.h and at the "some more places" mentioned above. With these changes, opt can successfully form differentiated code for the new fp80 testcases (before b01d87f, opt would crash as reported in #1600)). 895ed2e updates the expected output of the opt run.

Next, I'll think about extending the bit-trick handling.

@@ -650,6 +652,8 @@ class TypeTree : public std::enable_shared_from_this<TypeTree> {
chunk = 8;
} else if (flt->isHalfTy()) {
chunk = 2;
} else if (flt->isX86_FP80Ty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well here (within isFloat) simplify this to chunk = dl.getTypeSizeInBits(flt) / 8;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed in 24c8098

@@ -731,6 +735,8 @@ class TypeTree : public std::enable_shared_from_this<TypeTree> {
chunk = 8;
} else if (flt->isHalfTy()) {
chunk = 2;
} else if (flt->isX86_FP80Ty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about chunk = dl.getTypeSizeInBits(flt) / 8;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't simplify the code here like at the other places because there is no DataLayout instance (as far as I can see).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would it look like to add an argument for a datalayout here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In every place where IsAllFloat is called, there is a DataLayout instance lying around. In 6a70777, I add a DataLayout argument to the signature of IsAllFloat, in order to simplify its implementation in the same way as in the other places. (Could you please take another look and confirm that this change is correct?)

@@ -554,6 +554,8 @@ class TypeTree : public std::enable_shared_from_this<TypeTree> {
chunk = 8;
} else if (flt->isHalfTy()) {
chunk = 2;
} else if (flt->isX86_FP80Ty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also changed in 24c8098

@@ -5181,6 +5181,8 @@ Value *GradientUtils::invertPointerM(Value *const oval, IRBuilder<> &BuilderM,
chunk = 8;
} else if (flt->isHalfTy()) {
chunk = 2;
} else if (flt->isX86_FP80Ty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also changed in 24c8098

@maxaehle
Copy link
Contributor Author

maxaehle commented Jan 4, 2024

Does anyone know what the purpose of this piece of code in AdjointGenerator.h is:

switch (BO.getOpcode()) {
    case Instruction::And: {
      // If & against 0b10000000000 and a float the result is 0
      
      // [...]
      auto FT = TR.query(&BO).IsAllFloat(size);
      auto eFT = FT;
      if (FT)
        for (int i = 0; i < 2; ++i) {
          auto CI = dyn_cast<ConstantInt>(BO.getOperand(i));
          if (CI && dl.getTypeSizeInBits(eFT) ==
                        dl.getTypeSizeInBits(CI->getType())) {
            if (eFT->isDoubleTy() && CI->getValue() == -134217728) {
              setDiffe(&BO, Constant::getNullValue(diffTy), Builder2);
              // Derivative is zero (equivalent to rounding as just chopping off
              // bits of mantissa), no update
              return;
            }
          }
        }

Is some weird bit-trick handled here? How does the bit-trick work? 134217728 is 2**27, but I don't see right now how real arithmetic could be hidden here...

@wsmoses
Copy link
Member

wsmoses commented Jan 4, 2024

From the comment above it looks like it is saying float & signbit -> 0 derivative.

Also note that it was checking against a negative number (which is twos complement has different bits set)

Co-authored-by: William Moses <gh@wsmoses.com>
@wsmoses
Copy link
Member

wsmoses commented Jan 4, 2024

Could we also update the shift indices function that already manually handled fp80 to use the datalayout size check as well?

and use it to simplify the implementation of IsAllFloat.
@maxaehle
Copy link
Contributor Author

maxaehle commented Jan 4, 2024

Could we also update the shift indices function that already manually handled fp80 to use the datalayout size check as well?

Yes, I also changed it in 24c8098.

@maxaehle
Copy link
Contributor Author

maxaehle commented Jan 4, 2024

I can now compile and run the other example from #1600, https://fwd.gymni.ch/HHKrZj, on my local system. The Integration CI LLVM 11 Release ubuntu-20.04 run is marked as "failed" because three testcases unexpectedly pass now. They all use the C++ header <random> in a similar way (see e.g. here) as this example, so probably that's a good message.

@wsmoses
Copy link
Member

wsmoses commented Jan 4, 2024

LGTM, just fix the format per CI.

Yeah those tests are presently expected issues for 11

@wsmoses
Copy link
Member

wsmoses commented Jan 4, 2024

If you're up for it, it may also be useful to test fp128 in a follow up PR.

@wsmoses wsmoses merged commit 4dcea04 into EnzymeAD:main Jan 4, 2024
38 of 54 checks passed
@maxaehle
Copy link
Contributor Author

maxaehle commented Jan 4, 2024

Thanks! I might look at bit-tricks and/or fp128 in a separate PR in the next days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants