Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX #8

acollins3 · 2024-08-19T15:02:42Z

Fix MLIR type used for e4m3 fp8 type in NVIDIA PTX codegen.

…ng#4374) Update LLVM version to llvm/llvm-project@dd7d81e

…ng#4410) Included the use of the non-deprecated version of createMCObjectStreamer (needed after llvm/llvm-project@f1422a8).

…ng#4468)

…ng#4501)

google-cla · 2024-08-19T15:02:47Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

python/src/ir.cc

chsigg · 2024-08-20T13:42:33Z

include/triton/Dialect/Triton/IR/TritonTypes.td

@@ -15,7 +15,7 @@ class TritonTypeDef<string name, string _mnemonic, list<Trait> traits = []>
 }

 // Floating-point Type
-def TT_Float : AnyTypeOf<[F8E4M3FNUZ, F8E5M2, F8E5M2FNUZ, F16, BF16, F32, F64], "floating-point">;
+def TT_Float : AnyTypeOf<[F8E4M3FN, F8E4M3FNUZ, F8E5M2, F8E5M2FNUZ, F16, BF16, F32, F64], "floating-point">;


Should F8E4M3FNUZ really need to be removed?

Also below, in some places F8E4M3FN got added, in other places F8E4M3FNUZ is being replaced. It would be good to explain what we want in the PR description and apply it consistently. Or maybe I'm missing something and this is all intentional?

F8E4M3FNUZ needs to be listed here, as we want likely want support for it on other platforms (although I haven't tested this).

In places where it is NVIDIA PTX targetted, we replace F8E4M3FNUZ with F8E4M3FN, and in other "generic" places we add F8E4M3FN

lib/Dialect/TritonGPU/Transforms/AccelerateMatmul.cpp

Based on openxla/triton#8 PiperOrigin-RevId: 665336874

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

When running [convert_blocked1d_to_slice0](https://github.com/triton-lang/triton/blob/0ba5f0c3cd029d5c3d1f01b9bf29dac32c27345e/test/Conversion/tritongpu_to_llvm.mlir#L924) Triton ends up computing a rank of a matrix with 0 columns during linear layout lowering, which trips up f2reduce, and causes undefined behavior, detectable through [UBSAN](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html). Fix this by returning the rank (0) early in these cases, without calling f2reduce. <details><summary>Stack trace</summary> <p> ``` third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30: runtime error: shift exponent 18446744073709551615 is too large for 64-bit type 'unsigned long long' #0 0x556ee2fea3be in inplace_rref_small third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30 #1 0x556ee2fea3be in f2reduce::inplace_rref_strided(unsigned long*, unsigned long, unsigned long, unsigned long) third_party/triton/third_party/f2reduce/f2reduce.cpp:470:9 #2 0x556ee2ea70da in getMatrixRank third_party/triton/lib/Tools/LinearLayout.cpp:125:3 #3 0x556ee2ea70da in mlir::triton::LinearLayout::checkInvariants(bool) third_party/triton/lib/Tools/LinearLayout.cpp:299:7 #4 0x556ee2ea656d in mlir::triton::LinearLayout::tryCreate(llvm::MapVector<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>, llvm::DenseMap<mlir::StringAttr, unsigned int, llvm::DenseMapInfo<mlir::StringAttr, void>, llvm::detail::DenseMapPair<mlir::StringAttr, unsigned int>>, llvm::SmallVector<std::__u::pair<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>>, 0u>>, llvm::ArrayRef<std::__u::pair<mlir::StringAttr, int>>, bool) third_party/triton/lib/Tools/LinearLayout.cpp:190:41 #5 0x556ee2eb2150 in mlir::triton::LinearLayout::divideRight(mlir::triton::LinearLayout const&) third_party/triton/lib/Tools/LinearLayout.cpp:654:51 #6 0x556ee2ee1c39 in mlir::cvtNeedsSharedMemory(mlir::RankedTensorType, mlir::RankedTensorType) third_party/triton/lib/Analysis/Utility.cpp:652:14 #7 0x556ee2cf38fd in mlir::triton::getRepShapeForCvtLayout(mlir::triton::gpu::ConvertLayoutOp) third_party/triton/lib/Analysis/Allocation.cpp:66:8 #8 0x556ee2cf3efa in mlir::triton::getScratchConfigForCvtLayout(mlir::triton::gpu::ConvertLayoutOp, unsigned int&, unsigned int&) third_party/triton/lib/Analysis/Allocation.cpp:95:19 #9 0x556ee2cf6057 in mlir::triton::AllocationAnalysis::getScratchValueSize(mlir::Operation*) third_party/triton/lib/Analysis/Allocation.cpp:272:24 #10 0x556ee2cf5499 in operator() third_party/triton/lib/Analysis/Allocation.cpp:343:7 #11 0x556ee2cf5499 in void llvm::function_ref<void (mlir::Operation*)>::callback_fn<mlir::triton::AllocationAnalysis::getValuesAndSizes()::'lambda'(mlir::Operation*)>(long, mlir::Operation*) third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 #12 0x556edeeee7a9 in operator() third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 #13 0x556edeeee7a9 in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:174:5 #14 0x556edeeee87c in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:182:9 #15 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), mlir::Operation *, void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:313:10 #16 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Operation.h:794:12 #17 0x556ee2cf49e7 in mlir::triton::AllocationAnalysis::getValuesAndSizes() third_party/triton/lib/Analysis/Allocation.cpp:341:16 #18 0x556ee2cf4852 in run third_party/triton/lib/Analysis/Allocation.cpp:182:5 #19 0x556ee2cf4852 in AllocationAnalysis third_party/triton/lib/Analysis/Allocation.cpp:169:5 triton-lang#20 0x556ee2cf4852 in mlir::Allocation::run(llvm::DenseMap<mlir::FunctionOpInterface, mlir::Allocation, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>, llvm::detail::DenseMapPair<mlir::FunctionOpInterface, mlir::Allocation>>&) third_party/triton/lib/Analysis/Allocation.cpp:627:3 triton-lang#21 0x556ee1677402 in operator() third_party/triton/include/triton/Analysis/Allocation.h:227:26 triton-lang#22 0x556ee1677402 in void mlir::CallGraph<mlir::Allocation>::doWalk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)>(mlir::FunctionOpInterface, llvm::DenseSet<mlir::FunctionOpInterface, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>>&, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)) third_party/triton/include/triton/Analysis/Utility.h:350:7 triton-lang#23 0x556ee16756b3 in walk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, (lambda at third_party/triton/include/triton/Analysis/Allocation.h:222:9), (lambda at third_party/triton/include/triton/Analysis/Allocation.h:224:9)> third_party/triton/include/triton/Analysis/Utility.h:242:7 triton-lang#24 0x556ee16756b3 in mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp) third_party/triton/include/triton/Analysis/Allocation.h:220:5 triton-lang#25 0x556ee2c2bf18 in (anonymous namespace)::AllocateSharedMemory::runOnOperation() third_party/triton/lib/Conversion/TritonGPUToLLVM/AllocateSharedMemory.cpp:26:22 ... UndefinedBehaviorSanitizer: invalid-shift-exponent third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30 ``` </p> </details>

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

Imported from openxla/triton#8 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16439 from shraiysh:fix_pgle_latency_scheduler 44bab12c6fb0b0c4d60ac62113eae7c959c05536 PiperOrigin-RevId: 665336874

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

Imported from openxla/triton#8 PiperOrigin-RevId: 667560178

chsigg · 2024-09-04T14:06:11Z

This PR has landed upstream, closing.

vwbaker and others added 8 commits August 12, 2024 10:34

[BACKEND] Update LLVM version to llvm/llvm-project@de88b2c (triton-la…

54da2bf

…ng#4275)

[BACKEND] Update LLVM version to llvm/llvm-project@9ddfe62 (triton-la…

cc5f6a3

…ng#4323)

[BACKEND] Update LLVM version to llvm/llvm-project@dd7d81e (triton-la…

04b0b9c

…ng#4374) Update LLVM version to llvm/llvm-project@dd7d81e

[BACKEND] Update LLVM version to llvm/llvm-project@99bb9a7 (triton-la…

5e26bcf

…ng#4410) Included the use of the non-deprecated version of createMCObjectStreamer (needed after llvm/llvm-project@f1422a8).

[BACKEND] Update LLVM version to llvm/llvm-project@1a9acd7 (triton-la…

f5cd2cf

…ng#4468)

[BACKEND] Update LLVM version to llvm/llvm-project@4c5ef66 (triton-la…

803c588

…ng#4501)

OpenXLA-specific changes

1036fb7

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

bc0f7b5

khasanovaa force-pushed the llvm-head branch from 1036fb7 to d522bbd Compare August 20, 2024 08:17

chsigg reviewed Aug 20, 2024

View reviewed changes

acollins3 added 2 commits August 20, 2024 15:03

Revert formatting change

6d28f27

Remove comment

4848464

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Aug 21, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

d25efd7

Based on openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot mentioned this pull request Aug 21, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX openxla/xla#16285

Merged

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Aug 21, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

3dcdee3

Based on openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot mentioned this pull request Aug 21, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX tensorflow/tensorflow#74214

Merged

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Aug 21, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

108c2e7

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Aug 22, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

00e6d9a

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Aug 22, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

d876c1e

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Aug 22, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

287c60c

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Aug 22, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

fc53159

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Aug 22, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

7fc1f6e

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Aug 22, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

263fa56

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Aug 26, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

4de78fc

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Aug 26, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

533e281

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Aug 26, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

5bb6f2d

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Aug 26, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

f7e10b1

Imported from openxla/triton#8 PiperOrigin-RevId: 665336874

copybara-service bot pushed a commit to openxla/xla that referenced this pull request Aug 26, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

b2358b1

Imported from openxla/triton#8 PiperOrigin-RevId: 667560178

copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Aug 26, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX

fbc810b

Imported from openxla/triton#8 PiperOrigin-RevId: 667560178

karupayun force-pushed the llvm-head branch from d522bbd to 7a5940c Compare August 28, 2024 14:35

chsigg closed this Sep 4, 2024

chsigg mentioned this pull request Sep 10, 2024

[backend][fp8] Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX triton-lang/triton#4596

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX #8

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX #8

acollins3 commented Aug 19, 2024

google-cla bot commented Aug 19, 2024

chsigg Aug 20, 2024

acollins3 Aug 20, 2024

chsigg commented Sep 4, 2024

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX #8

Float8E4M3FNUZ -> Float8E4M3FN for NVIDIA PTX #8

Conversation

acollins3 commented Aug 19, 2024

google-cla bot commented Aug 19, 2024

chsigg Aug 20, 2024

Choose a reason for hiding this comment

acollins3 Aug 20, 2024

Choose a reason for hiding this comment

chsigg commented Sep 4, 2024