[RFC] Taichi Matrix & Vector refactor plan #5819

jim19930609 · 2022-08-18T08:12:16Z

Background and Purpose

Currently, Taichi scalarizes all the Matrix and corresponding operations at Python level. This approach has major drawbacks as follow:

No actual SIMD support for Matrix data type during CodeGen.
Lack of Matrix information at CHI IR level, thus not able to write any optimization pass wrt Matrix operations.
Huge overhead to migrate Matrix support to another Frontend Language (Julia, Javascript, etc)

To address the above mentioned drawbacks, we describe our refactor goal as follow:

Remove early scalarization at Python-level and make full Matrix information visible to CHI IR level.
Enable the ability to optimize Matrix operations via CHI IR Passes.
Enable SIMD instructions code generation for Matrix operations.
Minimize Matrix handling at Python-level to enhance its multiple Frontend Language support
No regression wrt functionality, performance and compilation efficiency especially with dynamic indexing on

Design Overview

In general there are three separate code paths with Matrix involved:

local Matrix
MatrixField：Field + Matrix
MatrixNdarray：Ndarray + Matrix

Each of the code path involves four typical use cases:

Indexed Matrix operations：
- Matrix Indexing
- Indexed Matrix Access：Load, Store, Add, Mul, ...
MatrixType operations:
- Matrix initialization
- Matrix operations：Load, Store, matmul, transpose, atomic, ...

Design Details

Local Matrix

Implementation plan: #5478

1. Representation

ti.Matrix returns a Expr(MatrixExpression) with TensorType

@ti.kernel
def example():
    x = ti.Matrix([[1, 2], [3, 4]], dt=ti.f32)
    print(x)

2. IR changes

[New Frontend IR] MatrixExpression: Initializing a matrix with ti.Matrix will result in the following Frontend IR:

kernel {
  $0 = alloca @tmp0
  @tmp0 = [(cast_value<f32> 1.0), (cast_value<f32> 2.0), (cast_value<f32> 3.0), (cast_value<f32> 4.0)] (dt=[Tensor (2, 2) f32])
  $2 = alloca @tmp1
  @tmp1 = @tmp0
  print @tmp1, "\n"
}

[New CHI IR] MatrixInitStmt: Matrix initialization
[Reuse CHI IR] PtrOffsetStmt: Represent Matrix indexing

3. Codegen (LLVM)

MatrixInitStmt -> llvm::VectorType（Init + InsertElement）
BinaryOpStmt -> Same LLVM Binary Instruction with llvm::VectorType operands

4. Lowering details

MatrixField

AOS Memory Layout

1. AOS Python Interface

Current：ti.Matrix.field(m=2, n=2, shape=3, ti.AOS)
After SNodeTreeType：builder.add_field(dtype=ti.vec3, name='x', shape=(4, 8))

2. Representation

The leaf snode of a SNodeTree becomes Place SNode with MatrixType, which get compiled into llvm::VectorType during CodeGen
Operations to MatrixField get divided into two stages:
- Obtain Place SNode through traversing SNodeTree
- Apply operations to the MatrixTyped data stored in the Place SNode.

3. Lowering details

SOA/Customized Memory Layout

1. SOA Python Interface

Current：

x = Matrix.field(m=2, n=2, shape=3, ti.SOA)
    or
x = Matrix.field(m=2, n=2)
ti.root.....place(x.get_scalar_field(0, 0))
ti.root.....place(x.get_scalar_field(0, 1))
ti.root.....place(x.get_scalar_field(1, 0))
ti.root.....place(x.get_scalar_field(1, 1))

After SNodeTreeType:

builder = ti.SNodeTreeBuilder()
builder.add_field(dtype=ti.f32, name='v0')
builder.add_field(dtype=ti.f32, name='v1')
builder.add_field(dtype=ti.f32, name='v2')
for v in ['v0', 'v1', 'v2']:
  builder.tree().dense(ti.ij, 4).place(v)

builder.add_field_view(dtype=ti.vec3, name='vel', components=['v0', 'v1', 'v2'])

2. Representation

[New Frontend IR] ListOfSNodeExpression(): Contains a list of SNodes, representing SOA-MatrixField at Frontend IR.
Support MatrixType with stride to represent SOA-Matrix, so SOA-Matrix can be represented the same way as AOS-Matrix at CHi IR level
SOA-MatrixField will get scalarized during CodeGen.

3. Lowering details

MatrixNdarray

MatrixNdarray share similar implementations wrt MatrixField. We mainly describe the lowering details in this section.

AOS Memory Layout

Lowering details:

SOA Memory Layout

Lowering details:

Fallback & Scalarization

[LocalMatrix & AOS] Fallback to "Array"：
- AOS fallback at CodeGen level
- ArrayType is friendly to dynamic indexing
- Possible reasons to fallback：Lack SIMD support at hardware level; Lack SIMD for specific dtype or size; performance regression with SIMD
[SOA] Fallback to standalone scalars：
- SOA get scalarized during CodeGen

Roadmap

Proposed Demos

python/taichi/examples/simulation/mpm88.py
python/taichi/examples/simulation/nbody.py

The text was updated successfully, but these errors were encountered:

…Stmt & GlobalStoreStmt with TensorType (#5946) Related issue = #5873, #5819 This PR is working "Part ④" in #5873. Scalarization implementation plan: ![NdarrayMatrix_Indexing (1) drawio (3)](https://user-images.githubusercontent.com/22334008/188117283-582fe6d5-2a0b-4cc3-99d7-c54a8b82c3eb.png)

…tmt & GlobalLoadStmt with TensorType (#6024) Related issue = #5873, #5819 This PR is working "Part ④" in #5873.

…t with TensorType-operand (#6080) Related issue = #5873, #5819 This PR is working "Part ④" in #5873.

…mt with TensorType-operands (#6086) Related issue = #5873, #5819 This PR is working "Part ④" in #5873.

…#6168) Related issue = #5873, #5819 This PR is working "Part ④" in #5873. [AllocaStmt scalarization] ``` Before: TensorType<4 x i32>* addr = AllocaStmt(TensorType<4 x i32>) After: i32 addr0 = AllocaStmt(i32) i32 addr1 = AllocaStmt(i32) i32 addr2 = AllocaStmt(i32) i32 addr3 = AllocaStmt(i32) scalarized_local_tensor_map_[addr] = {addr0, addr1, addr2, addr3} ``` [Load AllocaStmt] ``` Before: TensorType<4 x i32> val = LoadStmt(TensorType<4 x i32>* alloca_src) After: i32 val0 = LoadStmt(scalarized_local_tensor_map_[stmt][0]) i32 val1 = LoadStmt(scalarized_local_tensor_map_[stmt][1]) i32 val2 = LoadStmt(scalarized_local_tensor_map_[stmt][2]) i32 val3 = LoadStmt(scalarized_local_tensor_map_[stmt][3]) tmp = MatrixInitStmt(val0, val1, val2, val3) stmt->replace_all_usages_with(tmp) ``` [Store to AllocaStmt] ``` Before: StoreStmt(TensorType<4 x i32>* alloca_dest_stmt, TensorType<4 x i32> val) After: StoreStmt(i32* scalarized_local_tensor_map_[stmt][0], i32 val->cast<MatrixInitStmt>()->val[0]) StoreStmt(i32* scalarized_local_tensor_map_[stmt][1], i32 val->cast<MatrixInitStmt>()->val[1]) StoreStmt(i32* scalarized_local_tensor_map_[stmt][2], i32 val->cast<MatrixInitStmt>()->val[2]) StoreStmt(i32* scalarized_local_tensor_map_[stmt][3], i32 val->cast<MatrixInitStmt>()->val[3]) ``` Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

… generated from scalarization (#6171) Related issue = #5873, #5819 This PR is working "Part ④" in #5873.

…setStmt (#6189) Related issue = #5873, #5819 This PR is working "Part ④" in #5873. Fused `PtrOffsetStmt` with `base_ptr=ExternalPtrStmt` into a modified `ExternalPtrStmt` to minimized number of instructions for compilation performance purposes. Co-authored-by: Yi Xu <xy_xuyi@foxmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…lattened statements (taichi-dev#6749) Issue: taichi-dev#5819 Overriding the flattened statement `stmt` of an `Expression` can cause conflicts, for example: ``` @ti.kernel def test(): x = ti.Vector([1, 2, 3, 4]) tmp = x + x[0] # implicit broadcast ``` In `x + x[0]`, the `x` on the lhs serves as rvalue whereas the `x` in the `x[0]` serves as a lvalue, so the result of `flatten_rvalue()` and `flatten_lvalue()` will override each other. To avoid such conflicts, this PR refactored the `flatten_values()` functions: 1. Flattened statement `stmt` of an `Expression` will only get modified by `Expression::flatten()`, any other overriding will be forbidden. 2. `flatten_rvalue()` and `flatten_lvalue()` now returns the flattened statement as the result. External users such as `irpass::lower_ast()` will turn to use the returned statement. Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

Issue: taichi-dev#5819 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#5819 ### Brief Summary This PR fixes invalid syntax in the test. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#5819 ### Brief Summary Quant types are not primitive types, so we should not make assertions that types only contain primitive types and tensor type. We just need to trigger scalarization when all the operands have `TensorType`. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Zhanlue Yang <zy2284@columbia.edu>

Issue: taichi-dev#5819 ### Brief Summary Fixed a set of issues to make BLS tests work. 1. Modified GroupedNDRange generator to directly yield `Expr with TensorType` instead of `_IntermediateMatrix` when `real_matrix=True` 2. Added support for `rescale_index()` to handle `Expr with TensorType` 3. Added scalarization for `indices` of SNode ops Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

… Matrix (taichi-dev#6795) Issue: taichi-dev#5819 ### Brief Summary

…dev#6813) Issue: taichi-dev#5819 ### Brief Summary The background is that we would like to clearly distinguish vectors from matrices. After taichi-dev#6528, `transpose()` of a vector makes no sense so we'd better raise an error and guide users towards the current practice (`outer_product()`). Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

… operands' broadcasting (taichi-dev#6805) Issue: taichi-dev#5819 ### Brief Summary

…sorTyped operands (taichi-dev#6817) Issue: taichi-dev#5819 ### Brief Summary Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

…chi-dev#6822) Issue: taichi-dev#5819 ### Brief Summary

…hi-dev#6839) Issue: taichi-dev#5819 ### Brief Summary Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

…i-dev#6801) Issue: taichi-dev#5819 ### Brief Summary

…rue (taichi-dev#6873) Issue: taichi-dev#5819 ### Brief Summary As these two options are enabled by default (taichi-dev#6801), we no longer need separate tests for them. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#5819 ### Brief Summary We no longer need the switch after taichi-dev#6801. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#5819 ### Brief Summary Now the `ti.Matrix` class is only for Python-scope matrices. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#5819 ### Brief Summary 1. `+=` should not be used in (single-thread) matrix lib functions. It is an atomic op and will be demoted very late in the optimization passes, which is harmful to the compilation speed. 2. `__getitem__` should take only one parameter. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…taichi-dev#6928) Issue: taichi-dev#5819 ### Brief Summary Before this PR, matrix type inference directly takes the type of the first element, which is problematic. This PR fixes the inference by calculating the common type of all elements and removes the redundant `_MatrixEntriesInitializer`.

…-dev#6932) Issue: taichi-dev#5819 ### Brief Summary These two intermediate classes are unnecessary now, so let's remove them for simplicity. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#5819 ### Brief Summary There is no need to wrap some methods of `Matrix` into a separate class now. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…Frontend IR (taichi-dev#6934) Issue: taichi-dev#5819 ### Brief Summary Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: taichi-dev#5819 ### Brief Summary This PR removes `StrideExpression` and `expr_init_local_tensor`, which were for the old dynamic index implementation. `impl.subscript()` is also simplified a bit by the way. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…end IR (taichi-dev#6972) Issue: taichi-dev#5819 ### Brief Summary

…nsorType (taichi-dev#7043) Issue: taichi-dev#5819 ### Brief Summary

…hon to Frontend IR (taichi-dev#6942) Issue: taichi-dev#5819 ### Brief Summary For indices of TensorType, instead of scalarizing them at Python level, it is up to the Frontend IR's consumer to decide whether TensorType'd indices are acceptable and if we should have it scalarized. This PR removes `expand_expr` in Expression subscription and migrate the scalarization logics to the following constructors: 1. MeshIndexConversionExpression::MeshIndexConversionExpression 2. IndexExpression::IndexExpression

…o Frontend IR (taichi-dev#6946) Issue: taichi-dev#5819 ### Brief Summary

…on code to Frontend IR (taichi-dev#6968) Issue: taichi-dev#5819 ### Brief Summary

…n code to Frontend IR (taichi-dev#6980) Issue: taichi-dev#5819 ### Brief Summary

…hon code to Frontend IR (taichi-dev#7127) Issue: taichi-dev#5819 ### Brief Summary

…ichi-dev#7124) Issue: taichi-dev#5819 ### Brief Summary

…7143) Issue: taichi-dev#5819 ### Brief Summary 1. Removed redundant interface `fill_with_scalar()` 2. Refactored `cast()` into internal interface `_instantiate()` 3. MatrixType/VectorType will only return `ti.Matrix` in python_scope, while `Expr(TensorType)` in taichi_scope After this PR, both MatrixType and VectorType should have single public interface `__call__()`

jim19930609 added the doc Documentation related issues & PRs label Aug 18, 2022

jim19930609 assigned strongoier, AD1024 and jim19930609 Aug 18, 2022

jim19930609 changed the title ~~Taichi Matrix & Vector refactor plan~~ [RFC] Taichi Matrix & Vector refactor plan Aug 18, 2022

ailzhang added the RFC label Aug 19, 2022

jim19930609 mentioned this issue Aug 23, 2022

[Lang] Add @python_scope decorator for selected MatrixNdarray/VectorNdarray methods #5844

Merged

ailzhang mentioned this issue Aug 23, 2022

Support vector/matrix arg type in C-API #5856

Closed

This was referenced Sep 1, 2022

[Lang] MatrixNdarray refactor part4: Lowered TensorType to CHI IR level for elementwise-indexed MatrixNdarray #5936

Merged

[Lang] MatrixNdarray refactor part5: Add scalarization for LocalStoreStmt & GlobalStoreStmt with TensorType #5946

Merged

strongoier mentioned this issue Sep 2, 2022

[RFC] Implementation plan for MatrixField refactor #5959

Open

jim19930609 mentioned this issue Sep 9, 2022

[Lang] MatrixNdarray refactor part6: Add scalarization for LocalLoadStmt & GlobalLoadStmt with TensorType #6024

Merged

jim19930609 mentioned this issue Sep 16, 2022

[Lang] MatrixNdarray refactor part7: Add scalarization for UnaryOpStmt with TensorType-operand #6080

Merged

jim19930609 added a commit that referenced this issue Sep 16, 2022

[Lang] MatrixNdarray refactor part6: Add scalarization for LocalLoadS…

e23ad2d

…tmt & GlobalLoadStmt with TensorType (#6024) Related issue = #5873, #5819 This PR is working "Part ④" in #5873.

jim19930609 mentioned this issue Sep 17, 2022

[Lang] MatrixNdarray refactor part8: Add scalarization for BinaryOpStmt with TensorType-operands #6086

Merged

jim19930609 added a commit that referenced this issue Sep 21, 2022

[Lang] MatrixNdarray refactor part7: Add scalarization for UnaryOpStm…

9eae78d

…t with TensorType-operand (#6080) Related issue = #5873, #5819 This PR is working "Part ④" in #5873.

jim19930609 added a commit that referenced this issue Sep 26, 2022

[Lang] MatrixNdarray refactor part8: Add scalarization for BinaryOpSt…

05e037d

…mt with TensorType-operands (#6086) Related issue = #5873, #5819 This PR is working "Part ④" in #5873.

jim19930609 added a commit that referenced this issue Sep 30, 2022

[Lang] MatrixNdarray refactor part10: Remove redundant MatrixInitStmt…

2a3ac5c

… generated from scalarization (#6171) Related issue = #5873, #5819 This PR is working "Part ④" in #5873.

jim19930609 mentioned this issue Oct 11, 2022

[lang] MatrixType bug fix: Fix array indexing with MatrixType-index #6290

Closed

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[bug] MatrixType bug fix: Fix error with texture (taichi-dev#6775)

bf3fd37

Issue: taichi-dev#5819 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[bug] MatrixType bug fix: Add additional restrictions for unpacking a…

86d0572

… Matrix (taichi-dev#6795) Issue: taichi-dev#5819 ### Brief Summary

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[bug] Matrix refactor bug fix: Fix restrictions on BinaryOp/TernaryOp…

abe623b

… operands' broadcasting (taichi-dev#6805) Issue: taichi-dev#5819 ### Brief Summary

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[bug] Matrix refactor bug fix: Fix cross scope matrix operations (tai…

1c86083

…chi-dev#6822) Issue: taichi-dev#5819 ### Brief Summary

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[Lang] Enable real_matrix and real_matrix_scalarize by default (taich…

ad24ac8

…i-dev#6801) Issue: taichi-dev#5819 ### Brief Summary

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[lang] Migrate TensorType expansion for svd from Python code to Front…

a5ab74e

…end IR (taichi-dev#6972) Issue: taichi-dev#5819 ### Brief Summary

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[Lang] Add irpass::eliminate_immutable_local_vars() test cases for Te…

769edc3

…nsorType (taichi-dev#7043) Issue: taichi-dev#5819 ### Brief Summary

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[lang] Migrate TensorType expansion for ReturnStmt from Python code t…

a8448b5

…o Frontend IR (taichi-dev#6946) Issue: taichi-dev#5819 ### Brief Summary

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[lang] Migrate TensorType expansion for TextureOpExpression from Pyth…

b5d0925

…on code to Frontend IR (taichi-dev#6968) Issue: taichi-dev#5819 ### Brief Summary

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[lang] Migrate TensorType expansion for FuncCallExpression from Pytho…

786dcf8

…n code to Frontend IR (taichi-dev#6980) Issue: taichi-dev#5819 ### Brief Summary

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[lang] Migrate TensorType expansion in MatrixType/VectorType from Pyt…

63b9429

…hon code to Frontend IR (taichi-dev#7127) Issue: taichi-dev#5819 ### Brief Summary

quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023

[lang] Remove redundant TensorType expansion for function returns (ta…

eb083af

…ichi-dev#7124) Issue: taichi-dev#5819 ### Brief Summary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Taichi Matrix & Vector refactor plan #5819

[RFC] Taichi Matrix & Vector refactor plan #5819

jim19930609 commented Aug 18, 2022 •

edited

Loading

[RFC] Taichi Matrix & Vector refactor plan #5819

[RFC] Taichi Matrix & Vector refactor plan #5819

Comments

jim19930609 commented Aug 18, 2022 • edited Loading

Background and Purpose

Design Overview

Design Details

Local Matrix

1. Representation

2. IR changes

3. Codegen (LLVM)

4. Lowering details

MatrixField

AOS Memory Layout

1. AOS Python Interface

2. Representation

3. Lowering details

SOA/Customized Memory Layout

1. SOA Python Interface

2. Representation

3. Lowering details

MatrixNdarray

AOS Memory Layout

SOA Memory Layout

Fallback & Scalarization

Roadmap

Proposed Demos

jim19930609 commented Aug 18, 2022 •

edited

Loading