Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intelblas_gemm clean patch #19

Merged
merged 9 commits into from
Feb 16, 2017
Merged

intelblas_gemm clean patch #19

merged 9 commits into from
Feb 16, 2017

Conversation

listenlink
Copy link

Hi @gongzg @ptillet

This patch clean the hack code on tuning and add external profile of intelblas_gemm(image) for float32 dtype, please have a review, thx.

wujunkai166 and others added 8 commits February 10, 2017 10:33
Change-Id: Ic9edf18a3ae0f41b21c2ac374d50000fc5d4e6f3
Change-Id: I89f632e2598594805e24b6aa2d084dcfa1c4f218
v2: by zhigang, fix some warnings and remove half relative code.
v3: by lixiang, modify json file
v4: by junkai, optimize gemm image kernel and force isaac to run gemm image kernel.
v5: by junkai, change json file to force issac to run gemm image kernel.
Change-Id: Ieab41924476bfc001f7026fbea3b5ea5e56eb00b
We need to use image related API for Intel's image based kernels.
Also fixed a memory leak issue in the image based kernel.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
buffer kernel when input sizes are too large.

v2:
fix minor issues when release images. And also avoid use OpenCL
library directly.

Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
@gongzg
Copy link
Contributor

gongzg commented Feb 15, 2017

@ptillet ping for review. Thanks.

@ptillet
Copy link
Collaborator

ptillet commented Feb 15, 2017

I've looked at it and it looks good to me. I'll try it tomorrow to make sure there is no regression on AMD and NVidia HW.

@ptillet
Copy link
Collaborator

ptillet commented Feb 15, 2017

This looks great. Very good performance improvements! Thanks :)

I only have one minor question: is it possible to remove the OpenCL warnings? Successful BLAS calls are not expected to write anything on cout or cerr.

@listenlink
Copy link
Author

@ptillet , I just remove the ocl building warnings, please check it again, thanks.

@ptillet
Copy link
Collaborator

ptillet commented Feb 16, 2017

The warnings are gone, cool.

However, I've tested the PR on another machine and have the following segmentation fault:

BENCH	M	N	K	AT	BT	ISAAC
Deep	1760	16	1760	N	N	0.17	
Deep	1760	32	1760	N	N	0.31	
Deep	1760	64	1760	N	N	0.35	
Deep	1760	128	1760	N	N	0.40	
Deep	1760	7000	1760	N	N	0.49	
Deep	2048	16	2048	N	N	0.19	
Deep	2048	32	2048	N	N	0.33	
Deep	2048	64	2048	N	N	0.38	
Deep	2048	128	2048	N	N	0.36	
Deep	2048	7000	2048	N	N	0.48	
Deep	2560	16	2560	N	N	0.21	
Deep	2560	32	2560	N	N	0.37	
Deep	2560	64	2560	N	N	0.36	
Deep	2560	128	2560	N	N	0.38	
Deep	2560	7000	2560	N	N	0.47	
Deep	1760	16	1760	T	N	0.11	
Deep	1760	32	1760	T	N	0.22	

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff7551629 in isaac::driver::dispatch::f_impl<&isaac::driver::dispatch::clinit, _cl_mem* (*)(_cl_context*, unsigned long, _cl_image_format const*, _cl_image_desc const*, void*, int*), _cl_context*, unsigned long, _cl_image_format const*, _cl_image_desc const*, void*, int*> (lib_h=@0x7ffff7dd9920: 0x617ad0, 
    cache=@0x7ffff7dd9a38: 0x0, name=0x7ffff75abe37 "clCreateImage") at /tmp/isaac/include/isaac/driver/dispatch.h:74
#2  0x00007ffff754f4cd in isaac::driver::dispatch::clCreateImage (a=0x620010, b=1, c=0x7fffffffc830, d=0x7fffffffc910, e=0x0, f=0x7fffffffc7ac)
    at /tmp/isaac/lib/driver/dispatch.cpp:177
#3  0x00007ffff74fce02 in isaac::templates::intelblas_gemm_image::enqueue (this=0xcad900, queue=..., program=..., suffix="14", control=...)
    at /tmp/isaac/lib/jit/generation/gemm.cpp:829

I'm looking into it...

@ptillet
Copy link
Collaborator

ptillet commented Feb 16, 2017

It was a configuration problem with my machine (conflicts with the Altera OpenCL SDK).

@ptillet ptillet merged commit 9b97c56 into triton-lang:master Feb 16, 2017
codego7250 pushed a commit to codego7250/triton that referenced this pull request Nov 21, 2022
…st_reduce

reduce the skips for test_reduce functions
goostavz pushed a commit to goostavz/triton that referenced this pull request Aug 4, 2023
ptillet pushed a commit that referenced this pull request Apr 1, 2024
reduce the skips for test_reduce functions
ptillet added a commit that referenced this pull request Apr 1, 2024
jlebar pushed a commit that referenced this pull request Jun 21, 2024
When running
[convert_blocked1d_to_slice0](https://github.com/triton-lang/triton/blob/0ba5f0c3cd029d5c3d1f01b9bf29dac32c27345e/test/Conversion/tritongpu_to_llvm.mlir#L924)
Triton ends up computing a rank of a matrix with 0 columns during linear
layout lowering, which trips up f2reduce, and causes undefined behavior,
detectable through
[UBSAN](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html).

Fix this by returning the rank (0) early in these cases, without calling
f2reduce.

<details><summary>Stack trace</summary>
<p>

```
third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30: runtime error: shift exponent 18446744073709551615 is too large for 64-bit type 'unsigned long long'
    #0 0x556ee2fea3be in inplace_rref_small third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30
    #1 0x556ee2fea3be in f2reduce::inplace_rref_strided(unsigned long*, unsigned long, unsigned long, unsigned long) third_party/triton/third_party/f2reduce/f2reduce.cpp:470:9
    #2 0x556ee2ea70da in getMatrixRank third_party/triton/lib/Tools/LinearLayout.cpp:125:3
    #3 0x556ee2ea70da in mlir::triton::LinearLayout::checkInvariants(bool) third_party/triton/lib/Tools/LinearLayout.cpp:299:7
    #4 0x556ee2ea656d in mlir::triton::LinearLayout::tryCreate(llvm::MapVector<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>, llvm::DenseMap<mlir::StringAttr, unsigned int, llvm::DenseMapInfo<mlir::StringAttr, void>, llvm::detail::DenseMapPair<mlir::StringAttr, unsigned int>>, llvm::SmallVector<std::__u::pair<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>>, 0u>>, llvm::ArrayRef<std::__u::pair<mlir::StringAttr, int>>, bool) third_party/triton/lib/Tools/LinearLayout.cpp:190:41
    #5 0x556ee2eb2150 in mlir::triton::LinearLayout::divideRight(mlir::triton::LinearLayout const&) third_party/triton/lib/Tools/LinearLayout.cpp:654:51
    #6 0x556ee2ee1c39 in mlir::cvtNeedsSharedMemory(mlir::RankedTensorType, mlir::RankedTensorType) third_party/triton/lib/Analysis/Utility.cpp:652:14
    #7 0x556ee2cf38fd in mlir::triton::getRepShapeForCvtLayout(mlir::triton::gpu::ConvertLayoutOp) third_party/triton/lib/Analysis/Allocation.cpp:66:8
    #8 0x556ee2cf3efa in mlir::triton::getScratchConfigForCvtLayout(mlir::triton::gpu::ConvertLayoutOp, unsigned int&, unsigned int&) third_party/triton/lib/Analysis/Allocation.cpp:95:19
    #9 0x556ee2cf6057 in mlir::triton::AllocationAnalysis::getScratchValueSize(mlir::Operation*) third_party/triton/lib/Analysis/Allocation.cpp:272:24
    #10 0x556ee2cf5499 in operator() third_party/triton/lib/Analysis/Allocation.cpp:343:7
    #11 0x556ee2cf5499 in void llvm::function_ref<void (mlir::Operation*)>::callback_fn<mlir::triton::AllocationAnalysis::getValuesAndSizes()::'lambda'(mlir::Operation*)>(long, mlir::Operation*) third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #12 0x556edeeee7a9 in operator() third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #13 0x556edeeee7a9 in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:174:5
    #14 0x556edeeee87c in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:182:9
    #15 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), mlir::Operation *, void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:313:10
    #16 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Operation.h:794:12
    #17 0x556ee2cf49e7 in mlir::triton::AllocationAnalysis::getValuesAndSizes() third_party/triton/lib/Analysis/Allocation.cpp:341:16
    #18 0x556ee2cf4852 in run third_party/triton/lib/Analysis/Allocation.cpp:182:5
    #19 0x556ee2cf4852 in AllocationAnalysis third_party/triton/lib/Analysis/Allocation.cpp:169:5
    #20 0x556ee2cf4852 in mlir::Allocation::run(llvm::DenseMap<mlir::FunctionOpInterface, mlir::Allocation, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>, llvm::detail::DenseMapPair<mlir::FunctionOpInterface, mlir::Allocation>>&) third_party/triton/lib/Analysis/Allocation.cpp:627:3
    #21 0x556ee1677402 in operator() third_party/triton/include/triton/Analysis/Allocation.h:227:26
    #22 0x556ee1677402 in void mlir::CallGraph<mlir::Allocation>::doWalk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)>(mlir::FunctionOpInterface, llvm::DenseSet<mlir::FunctionOpInterface, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>>&, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)) third_party/triton/include/triton/Analysis/Utility.h:350:7
    #23 0x556ee16756b3 in walk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, (lambda at third_party/triton/include/triton/Analysis/Allocation.h:222:9), (lambda at third_party/triton/include/triton/Analysis/Allocation.h:224:9)> third_party/triton/include/triton/Analysis/Utility.h:242:7
    #24 0x556ee16756b3 in mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp) third_party/triton/include/triton/Analysis/Allocation.h:220:5
    #25 0x556ee2c2bf18 in (anonymous namespace)::AllocateSharedMemory::runOnOperation() third_party/triton/lib/Conversion/TritonGPUToLLVM/AllocateSharedMemory.cpp:26:22
...
UndefinedBehaviorSanitizer: invalid-shift-exponent third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30 
```
</p>
</details>
oraluben pushed a commit to oraluben/triton that referenced this pull request Sep 11, 2024
* [CPU] Dump human-readable asm code in TRITON_CACHE_DIR

* Don't touch the main compiler.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants