Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Taichi Matrix & Vector refactor plan #5819

Open
jim19930609 opened this issue Aug 18, 2022 · 0 comments
Open

[RFC] Taichi Matrix & Vector refactor plan #5819

jim19930609 opened this issue Aug 18, 2022 · 0 comments
Assignees
Labels
doc Documentation related issues & PRs RFC

Comments

@jim19930609
Copy link
Contributor

jim19930609 commented Aug 18, 2022

Background and Purpose

Currently, Taichi scalarizes all the Matrix and corresponding operations at Python level. This approach has major drawbacks as follow:

  1. No actual SIMD support for Matrix data type during CodeGen.
  2. Lack of Matrix information at CHI IR level, thus not able to write any optimization pass wrt Matrix operations.
  3. Huge overhead to migrate Matrix support to another Frontend Language (Julia, Javascript, etc)

To address the above mentioned drawbacks, we describe our refactor goal as follow:

  1. Remove early scalarization at Python-level and make full Matrix information visible to CHI IR level.
  2. Enable the ability to optimize Matrix operations via CHI IR Passes.
  3. Enable SIMD instructions code generation for Matrix operations.
  4. Minimize Matrix handling at Python-level to enhance its multiple Frontend Language support
  5. No regression wrt functionality, performance and compilation efficiency especially with dynamic indexing on

Design Overview

In general there are three separate code paths with Matrix involved:

  1. local Matrix
  2. MatrixField:Field + Matrix
  3. MatrixNdarray:Ndarray + Matrix

Each of the code path involves four typical use cases:

  1. Indexed Matrix operations:
    • Matrix Indexing
    • Indexed Matrix Access:Load, Store, Add, Mul, ...
  2. MatrixType operations:
    • Matrix initialization
    • Matrix operations:Load, Store, matmul, transpose, atomic, ...

Design Details

Local Matrix

Implementation plan: #5478

image

1. Representation

ti.Matrix returns a Expr(MatrixExpression) with TensorType

@ti.kernel
def example():
    x = ti.Matrix([[1, 2], [3, 4]], dt=ti.f32)
    print(x)

2. IR changes

[New Frontend IR] MatrixExpression: Initializing a matrix with ti.Matrix will result in the following Frontend IR:

kernel {
  $0 = alloca @tmp0
  @tmp0 = [(cast_value<f32> 1.0), (cast_value<f32> 2.0), (cast_value<f32> 3.0), (cast_value<f32> 4.0)] (dt=[Tensor (2, 2) f32])
  $2 = alloca @tmp1
  @tmp1 = @tmp0
  print @tmp1, "\n"
}

[New CHI IR] MatrixInitStmt: Matrix initialization
[Reuse CHI IR] PtrOffsetStmt: Represent Matrix indexing

3. Codegen (LLVM)

MatrixInitStmt -> llvm::VectorType(Init + InsertElement)
BinaryOpStmt -> Same LLVM Binary Instruction with llvm::VectorType operands

4. Lowering details

image

MatrixField

AOS Memory Layout

1. AOS Python Interface

Current:ti.Matrix.field(m=2, n=2, shape=3, ti.AOS)
After SNodeTreeType:builder.add_field(dtype=ti.vec3, name='x', shape=(4, 8))

2. Representation

image

  • The leaf snode of a SNodeTree becomes Place SNode with MatrixType, which get compiled into llvm::VectorType during CodeGen
  • Operations to MatrixField get divided into two stages:
    • Obtain Place SNode through traversing SNodeTree
    • Apply operations to the MatrixTyped data stored in the Place SNode.

3. Lowering details

image

SOA/Customized Memory Layout

1. SOA Python Interface

Current:

x = Matrix.field(m=2, n=2, shape=3, ti.SOA)
    or
x = Matrix.field(m=2, n=2)
ti.root.....place(x.get_scalar_field(0, 0))
ti.root.....place(x.get_scalar_field(0, 1))
ti.root.....place(x.get_scalar_field(1, 0))
ti.root.....place(x.get_scalar_field(1, 1))

After SNodeTreeType:

builder = ti.SNodeTreeBuilder()
builder.add_field(dtype=ti.f32, name='v0')
builder.add_field(dtype=ti.f32, name='v1')
builder.add_field(dtype=ti.f32, name='v2')
for v in ['v0', 'v1', 'v2']:
  builder.tree().dense(ti.ij, 4).place(v)

builder.add_field_view(dtype=ti.vec3, name='vel', components=['v0', 'v1', 'v2'])

2. Representation

image

  • [New Frontend IR] ListOfSNodeExpression(): Contains a list of SNodes, representing SOA-MatrixField at Frontend IR.
  • Support MatrixType with stride to represent SOA-Matrix, so SOA-Matrix can be represented the same way as AOS-Matrix at CHi IR level
  • SOA-MatrixField will get scalarized during CodeGen.

3. Lowering details

image

MatrixNdarray

MatrixNdarray share similar implementations wrt MatrixField. We mainly describe the lowering details in this section.

AOS Memory Layout

image

Lowering details:

image

SOA Memory Layout

image

Lowering details:

image

Fallback & Scalarization

  • [LocalMatrix & AOS] Fallback to "Array":
    • AOS fallback at CodeGen level
    • ArrayType is friendly to dynamic indexing
    • Possible reasons to fallback:Lack SIMD support at hardware level; Lack SIMD for specific dtype or size; performance regression with SIMD
  • [SOA] Fallback to standalone scalars:
    • SOA get scalarized during CodeGen

Roadmap

image

Proposed Demos

  1. python/taichi/examples/simulation/mpm88.py
  2. python/taichi/examples/simulation/nbody.py
@jim19930609 jim19930609 added the doc Documentation related issues & PRs label Aug 18, 2022
@jim19930609 jim19930609 changed the title Taichi Matrix & Vector refactor plan [RFC] Taichi Matrix & Vector refactor plan Aug 18, 2022
@ailzhang ailzhang added the RFC label Aug 19, 2022
jim19930609 added a commit that referenced this issue Sep 15, 2022
…Stmt & GlobalStoreStmt with TensorType (#5946)

Related issue = #5873,
#5819

This PR is working "Part ④" in
#5873.

Scalarization implementation plan: 
![NdarrayMatrix_Indexing (1) drawio
(3)](https://user-images.githubusercontent.com/22334008/188117283-582fe6d5-2a0b-4cc3-99d7-c54a8b82c3eb.png)
jim19930609 added a commit that referenced this issue Sep 16, 2022
…tmt & GlobalLoadStmt with TensorType (#6024)

Related issue = #5873,
#5819

This PR is working "Part ④" in
#5873.
jim19930609 added a commit that referenced this issue Sep 21, 2022
…t with TensorType-operand (#6080)

Related issue = #5873,
#5819

This PR is working "Part ④" in
#5873.
jim19930609 added a commit that referenced this issue Sep 26, 2022
…mt with TensorType-operands (#6086)

Related issue = #5873,
#5819

This PR is working "Part ④" in
#5873.
jim19930609 added a commit that referenced this issue Sep 29, 2022
…#6168)

Related issue = #5873,
#5819

This PR is working "Part ④" in
#5873.

[AllocaStmt scalarization]
```
Before:
  TensorType<4 x i32>* addr = AllocaStmt(TensorType<4 x i32>)

After:
  i32 addr0 = AllocaStmt(i32)
  i32 addr1 = AllocaStmt(i32)
  i32 addr2 = AllocaStmt(i32)
  i32 addr3 = AllocaStmt(i32)

  scalarized_local_tensor_map_[addr] = {addr0, addr1, addr2, addr3}
```

[Load AllocaStmt]
```
Before:
  TensorType<4 x i32> val = LoadStmt(TensorType<4 x i32>* alloca_src)

After:
  i32 val0 = LoadStmt(scalarized_local_tensor_map_[stmt][0])
  i32 val1 = LoadStmt(scalarized_local_tensor_map_[stmt][1])
  i32 val2 = LoadStmt(scalarized_local_tensor_map_[stmt][2])
  i32 val3 = LoadStmt(scalarized_local_tensor_map_[stmt][3])

  tmp = MatrixInitStmt(val0, val1, val2, val3)
  stmt->replace_all_usages_with(tmp)
```

[Store to AllocaStmt]
```
Before:
  StoreStmt(TensorType<4 x i32>* alloca_dest_stmt, TensorType<4 x i32> val)

After:
  StoreStmt(i32* scalarized_local_tensor_map_[stmt][0], 
            i32 val->cast<MatrixInitStmt>()->val[0]) 
  StoreStmt(i32* scalarized_local_tensor_map_[stmt][1], 
            i32 val->cast<MatrixInitStmt>()->val[1]) 
  StoreStmt(i32* scalarized_local_tensor_map_[stmt][2], 
            i32 val->cast<MatrixInitStmt>()->val[2]) 
  StoreStmt(i32* scalarized_local_tensor_map_[stmt][3], 
            i32 val->cast<MatrixInitStmt>()->val[3])
```

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
jim19930609 added a commit that referenced this issue Sep 30, 2022
… generated from scalarization (#6171)

Related issue = #5873,
#5819

This PR is working "Part ④" in
#5873.
jim19930609 added a commit that referenced this issue Oct 8, 2022
…setStmt (#6189)

Related issue = #5873,
#5819

This PR is working "Part ④" in
#5873.

Fused `PtrOffsetStmt` with `base_ptr=ExternalPtrStmt` into a modified
`ExternalPtrStmt` to minimized number of instructions for compilation
performance purposes.

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…lattened statements (taichi-dev#6749)

Issue: taichi-dev#5819

Overriding the flattened statement `stmt` of an `Expression` can cause
conflicts, for example:
```
@ti.kernel
def test():
    x = ti.Vector([1, 2, 3, 4])
    tmp = x + x[0] # implicit broadcast
```

In `x + x[0]`, the `x` on the lhs serves as rvalue whereas the `x` in
the `x[0]` serves as a lvalue, so the result of `flatten_rvalue()` and
`flatten_lvalue()` will override each other.

To avoid such conflicts, this PR refactored the `flatten_values()`
functions:
1. Flattened statement `stmt` of an `Expression` will only get modified
by `Expression::flatten()`, any other overriding will be forbidden.
2. `flatten_rvalue()` and `flatten_lvalue()` now returns the flattened
statement as the result. External users such as `irpass::lower_ast()`
will turn to use the returned statement.

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#5819

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#5819

### Brief Summary

This PR fixes invalid syntax in the test.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#5819

### Brief Summary

Quant types are not primitive types, so we should not make assertions
that types only contain primitive types and tensor type. We just need to
trigger scalarization when all the operands have `TensorType`.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zhanlue Yang <zy2284@columbia.edu>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#5819

### Brief Summary
Fixed a set of issues to make BLS tests work.
1. Modified GroupedNDRange generator to directly yield `Expr with
TensorType` instead of `_IntermediateMatrix` when `real_matrix=True`
2. Added support for `rescale_index()` to handle `Expr with TensorType`
3. Added scalarization for `indices` of SNode ops

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…dev#6813)

Issue: taichi-dev#5819

### Brief Summary

The background is that we would like to clearly distinguish vectors from
matrices. After taichi-dev#6528, `transpose()` of a vector makes no sense so we'd
better raise an error and guide users towards the current practice
(`outer_product()`).

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…sorTyped operands (taichi-dev#6817)

Issue: taichi-dev#5819

### Brief Summary

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…hi-dev#6839)

Issue: taichi-dev#5819

### Brief Summary

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…rue (taichi-dev#6873)

Issue: taichi-dev#5819

### Brief Summary

As these two options are enabled by default (taichi-dev#6801), we no longer need
separate tests for them.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#5819

### Brief Summary

We no longer need the switch after taichi-dev#6801.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#5819

### Brief Summary

Now the `ti.Matrix` class is only for Python-scope matrices.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#5819

### Brief Summary

1. `+=` should not be used in (single-thread) matrix lib functions. It
is an atomic op and will be demoted very late in the optimization
passes, which is harmful to the compilation speed.
2. `__getitem__` should take only one parameter.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…taichi-dev#6928)

Issue: taichi-dev#5819

### Brief Summary

Before this PR, matrix type inference directly takes the type of the
first element, which is problematic. This PR fixes the inference by
calculating the common type of all elements and removes the redundant
`_MatrixEntriesInitializer`.
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…-dev#6932)

Issue: taichi-dev#5819

### Brief Summary

These two intermediate classes are unnecessary now, so let's remove them
for simplicity.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#5819

### Brief Summary

There is no need to wrap some methods of `Matrix` into a separate class
now.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…Frontend IR (taichi-dev#6934)

Issue: taichi-dev#5819

### Brief Summary

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Issue: taichi-dev#5819

### Brief Summary

This PR removes `StrideExpression` and `expr_init_local_tensor`, which
were for the old dynamic index implementation. `impl.subscript()` is
also simplified a bit by the way.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…hon to Frontend IR (taichi-dev#6942)

Issue: taichi-dev#5819

### Brief Summary
For indices of TensorType, instead of scalarizing them at Python level,
it is up to the Frontend IR's consumer to decide whether TensorType'd
indices are acceptable and if we should have it scalarized.

This PR removes `expand_expr` in Expression subscription and migrate the
scalarization logics to the following constructors:

1. MeshIndexConversionExpression::MeshIndexConversionExpression
2. IndexExpression::IndexExpression
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…7143)

Issue: taichi-dev#5819

### Brief Summary
1. Removed redundant interface `fill_with_scalar()`
2. Refactored `cast()` into internal interface `_instantiate()`
3. MatrixType/VectorType will only return `ti.Matrix` in python_scope,
while `Expr(TensorType)` in taichi_scope

After this PR, both MatrixType and VectorType should have single public
interface `__call__()`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation related issues & PRs RFC
Projects
Status: In Progress
Development

No branches or pull requests

4 participants