Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different value on a matrix when print is present and absent #6663

Closed
lin-hitonami opened this issue Nov 18, 2022 · 3 comments · Fixed by #6820
Closed

Different value on a matrix when print is present and absent #6663

lin-hitonami opened this issue Nov 18, 2022 · 3 comments · Fixed by #6820
Assignees
Labels
potential bug Something that looks like a bug but not yet confirmed

Comments

@lin-hitonami
Copy link
Contributor

lin-hitonami commented Nov 18, 2022

Describe the bug
Different value on a matrix when print is present and absent
Originally posted in https://forum.taichi-lang.cn/t/topic/3547/4

To Reproduce

import taichi as ti
ti.init(ti.cpu,dynamic_index=True)


@ti.func
def jacob_eigen_test(a:ti.template()):
    p = ti.math.eye(a.n)
    tol = 1.0e-7
    sig = ti.Vector.zero(ti.f32,a.n)
    aMax = 1.0
    print('p1',p[0,0])
    while aMax > tol:
        print('p2',p[0,0])
        aMax = 0

        for i in range(a.n):      # Update transformation matrix
            p[i,0] = -1

    for ii in range(a.n):
        sig[ii] = a[ii,ii]
    return sig, p

@ti.kernel
def test():
    test_S =ti.math.mat4(0)
    Sig, P = jacob_eigen_test(test_S)
    # print(Sig,P)

@ti.kernel
def test2():
    test_S =ti.math.mat4(0)
    Sig, P = jacob_eigen_test(test_S)
    print(Sig,P)


test()
print('test2')
test2()

Log/Screenshots

p1 1.000000
p2 1.000000
test2
p1 1.000000
p2 0.000000
[0.000000, 0.000000, 0.000000, 0.000000] [[-1.000000, 0.000000, 0.000000, 0.000000], [-1.000000, 1.000000, 0.000000, 0.000000], [-1.000000, 0.000000, 1.000000, 0.000000], [-1.000000, 0.000000, 0.000000, 1.000000]]

Additional comments
The IR of test:

kernel {
$0 = offloaded  
body {
  <f32> $1 = const -1.0
  <i32> $2 = const 4
  <i32> $3 = const 0
  <i32> $4 = const 1
  <i32> $5 = const 2
  <f32> $6 = const 0.0
  <f32> $7 = const 1.0
  <[Tensor (4, 4) f32]> $8 = alloca
  <*f32> $9 = shift ptr [$8 + $3]
  <f32> $10 : local store [$9 <- $7]
  <f32> $11 = const 1e-07
  <f32> $12 = alloca
  <f32> $13 : local store [$12 <- $7]
  print "p1 ", $7, "\n"
  $15 : while true {
    <f32> $16 = local load [$12]
    <i32> $17 = cmp_gt $16 $11
    <i32> $18 = bit_and $17 $4
    $19 : if $18 {
    } else {
      $20 : while control nullptr, $3
    }
    <f32> $21 = local load [$9]
    print "p2 ", $21, "\n"
    <f32> $23 : local store [$12 <- $6]
    $24 : for in range($3, $2) block_dim=adaptive {
      <i32> $25 = loop $24 index 0
      <i32> $26 = bit_shl $25 $5
      <*f32> $27 = shift ptr [$8 + $26]
      <f32> $28 : local store [$27 <- $1]
    }
  }
}
}

The IR of test2:

kernel {
$0 = offloaded  
body {
  <f32> $1 = const -1.0
  <i32> $2 = const 4
  <i32> $3 = const 0
  <i32> $4 = const 1
  <i32> $5 = const 2
  <i32> $6 = const 3
  <i32> $7 = const 5
  <i32> $8 = const 6
  <i32> $9 = const 7
  <i32> $10 = const 8
  <i32> $11 = const 9
  <i32> $12 = const 10
  <i32> $13 = const 11
  <i32> $14 = const 12
  <i32> $15 = const 13
  <i32> $16 = const 14
  <i32> $17 = const 15
  <f32> $18 = const 0.0
  <[Tensor (4, 4) f32]> $19 = global tmp var (offset = 0 B)
  <*f32> $20 = shift ptr [$19 + $3]
  $21 : global store [$20 <- $18]
  <*f32> $22 = shift ptr [$19 + $2]
  $23 : global store [$22 <- $18]
  <*f32> $24 = shift ptr [$19 + $10]
  $25 : global store [$24 <- $18]
  <*f32> $26 = shift ptr [$19 + $14]
  $27 : global store [$26 <- $18]
  <i32> $28 = const 16
  <*f32> $29 = shift ptr [$19 + $28]
  $30 : global store [$29 <- $18]
  <i32> $31 = const 20
  <*f32> $32 = shift ptr [$19 + $31]
  $33 : global store [$32 <- $18]
  <i32> $34 = const 24
  <*f32> $35 = shift ptr [$19 + $34]
  $36 : global store [$35 <- $18]
  <i32> $37 = const 28
  <*f32> $38 = shift ptr [$19 + $37]
  $39 : global store [$38 <- $18]
  <i32> $40 = const 32
  <*f32> $41 = shift ptr [$19 + $40]
  $42 : global store [$41 <- $18]
  <i32> $43 = const 36
  <*f32> $44 = shift ptr [$19 + $43]
  $45 : global store [$44 <- $18]
  <i32> $46 = const 40
  <*f32> $47 = shift ptr [$19 + $46]
  $48 : global store [$47 <- $18]
  <i32> $49 = const 44
  <*f32> $50 = shift ptr [$19 + $49]
  $51 : global store [$50 <- $18]
  <i32> $52 = const 48
  <*f32> $53 = shift ptr [$19 + $52]
  $54 : global store [$53 <- $18]
  <i32> $55 = const 52
  <*f32> $56 = shift ptr [$19 + $55]
  $57 : global store [$56 <- $18]
  <i32> $58 = const 56
  <*f32> $59 = shift ptr [$19 + $58]
  $60 : global store [$59 <- $18]
  <i32> $61 = const 60
  <*f32> $62 = shift ptr [$19 + $61]
  $63 : global store [$62 <- $18]
  <*f32> $64 = shift ptr [$19 + $4]
  $65 : global store [$64 <- $18]
  <*f32> $66 = shift ptr [$19 + $5]
  $67 : global store [$66 <- $18]
  <*f32> $68 = shift ptr [$19 + $6]
  $69 : global store [$68 <- $18]
  <*f32> $70 = shift ptr [$19 + $7]
  $71 : global store [$70 <- $18]
  <*f32> $72 = shift ptr [$19 + $8]
  $73 : global store [$72 <- $18]
  <*f32> $74 = shift ptr [$19 + $9]
  $75 : global store [$74 <- $18]
  <*f32> $76 = shift ptr [$19 + $11]
  $77 : global store [$76 <- $18]
  <*f32> $78 = shift ptr [$19 + $12]
  $79 : global store [$78 <- $18]
  <*f32> $80 = shift ptr [$19 + $13]
  $81 : global store [$80 <- $18]
  <*f32> $82 = shift ptr [$19 + $15]
  $83 : global store [$82 <- $18]
  <*f32> $84 = shift ptr [$19 + $16]
  $85 : global store [$84 <- $18]
  <*f32> $86 = shift ptr [$19 + $17]
  $87 : global store [$86 <- $18]
  <f32> $88 = const 1.0
  <[Tensor (4, 4) f32]> $89 = global tmp var (offset = 80 B)
  <*f32> $90 = shift ptr [$89 + $3]
  <*f32> $91 = shift ptr [$89 + $2]
  $92 : global store [$91 <- $18]
  <*f32> $93 = shift ptr [$89 + $10]
  $94 : global store [$93 <- $18]
  <*f32> $95 = shift ptr [$89 + $14]
  $96 : global store [$95 <- $18]
  <*f32> $97 = shift ptr [$89 + $28]
  $98 : global store [$97 <- $18]
  <*f32> $99 = shift ptr [$89 + $31]
  $100 : global store [$99 <- $18]
  <*f32> $101 = shift ptr [$89 + $34]
  $102 : global store [$101 <- $18]
  <*f32> $103 = shift ptr [$89 + $37]
  $104 : global store [$103 <- $18]
  <*f32> $105 = shift ptr [$89 + $40]
  $106 : global store [$105 <- $18]
  <*f32> $107 = shift ptr [$89 + $43]
  $108 : global store [$107 <- $18]
  <*f32> $109 = shift ptr [$89 + $46]
  $110 : global store [$109 <- $18]
  <*f32> $111 = shift ptr [$89 + $49]
  $112 : global store [$111 <- $18]
  <*f32> $113 = shift ptr [$89 + $52]
  $114 : global store [$113 <- $18]
  <*f32> $115 = shift ptr [$89 + $55]
  $116 : global store [$115 <- $18]
  <*f32> $117 = shift ptr [$89 + $58]
  $118 : global store [$117 <- $18]
  <*f32> $119 = shift ptr [$89 + $61]
  $120 : global store [$119 <- $18]
  $121 : global store [$90 <- $88]
  <*f32> $122 = shift ptr [$89 + $4]
  $123 : global store [$122 <- $18]
  <*f32> $124 = shift ptr [$89 + $5]
  $125 : global store [$124 <- $18]
  <*f32> $126 = shift ptr [$89 + $6]
  $127 : global store [$126 <- $18]
  <*f32> $128 = shift ptr [$89 + $7]
  $129 : global store [$128 <- $88]
  <*f32> $130 = shift ptr [$89 + $8]
  $131 : global store [$130 <- $18]
  <*f32> $132 = shift ptr [$89 + $9]
  $133 : global store [$132 <- $18]
  <*f32> $134 = shift ptr [$89 + $11]
  $135 : global store [$134 <- $18]
  <*f32> $136 = shift ptr [$89 + $12]
  $137 : global store [$136 <- $88]
  <*f32> $138 = shift ptr [$89 + $13]
  $139 : global store [$138 <- $18]
  <*f32> $140 = shift ptr [$89 + $15]
  $141 : global store [$140 <- $18]
  <*f32> $142 = shift ptr [$89 + $16]
  $143 : global store [$142 <- $18]
  <*f32> $144 = shift ptr [$89 + $17]
  $145 : global store [$144 <- $88]
  <f32> $146 = const 1e-07
  <[Tensor (4) f32]> $147 = global tmp var (offset = 64 B)
  <*f32> $148 = shift ptr [$147 + $3]
  $149 : global store [$148 <- $18]
  <*f32> $150 = shift ptr [$147 + $2]
  $151 : global store [$150 <- $18]
  <*f32> $152 = shift ptr [$147 + $10]
  $153 : global store [$152 <- $18]
  <*f32> $154 = shift ptr [$147 + $14]
  $155 : global store [$154 <- $18]
  <*f32> $156 = shift ptr [$147 + $4]
  $157 : global store [$156 <- $18]
  <*f32> $158 = shift ptr [$147 + $5]
  $159 : global store [$158 <- $18]
  <*f32> $160 = shift ptr [$147 + $6]
  $161 : global store [$160 <- $18]
  <f32> $162 = alloca
  <f32> $163 : local store [$162 <- $88]
  print "p1 ", $88, "\n"
  $165 : while true {
    <f32> $166 = local load [$162]
    <i32> $167 = cmp_gt $166 $146
    <i32> $168 = bit_and $167 $4
    $169 : if $168 {
    } else {
      $170 : while control nullptr, $3
    }
    <f32> $171 = global load $90
    print "p2 ", $171, "\n"
    <f32> $173 : local store [$162 <- $18]
    $174 : for in range($3, $2) block_dim=adaptive {
      <i32> $175 = loop $174 index 0
      <i32> $176 = bit_shl $175 $5
      <*f32> $177 = shift ptr [$89 + $176]
      $178 : global store [$177 <- $1]
    }
  }
}
}
@lin-hitonami lin-hitonami added the potential bug Something that looks like a bug but not yet confirmed label Nov 18, 2022
@lin-hitonami
Copy link
Contributor Author

Is this related to the dynamic index? @strongoier

@lin-hitonami
Copy link
Contributor Author

This problem exists on taichi 0.9.2 (the oldest version on the pypi).

@strongoier
Copy link
Contributor

A simpler code snippet:

import taichi as ti
ti.init(ti.cpu, dynamic_index=True)

@ti.func
def jacob_eigen_test():
    p = ti.Matrix([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
    loop = 1
    sig = ti.Vector([0, 0, 0, 0])
    print('p1', p[0, 0])
    while loop == 1:
        print('p2', p[0, 0])
        loop = 0
        p[0, 0] = -1
    for i in range(1):
        sig[i] = 2
    return sig, p

@ti.kernel
def test():
    Sig, P = jacob_eigen_test()

@ti.kernel
def test2():
    Sig, P = jacob_eigen_test()
    print(Sig,P)


test()
print('test2')
test2()

strongoier added a commit that referenced this issue Dec 7, 2022
Issue: fix #6663

### Brief Summary

In `MatrixPtrStmt`, when `origin` is `GlobalTemporaryStmt`, the
semantics of `offset` has changed from the number of bytes to the number
of elements. This PR fixes the outdated usage which may overwrite the
global tmp buffer.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…v#6820)

Issue: fix taichi-dev#6663

### Brief Summary

In `MatrixPtrStmt`, when `origin` is `GlobalTemporaryStmt`, the
semantics of `offset` has changed from the number of bytes to the number
of elements. This PR fixes the outdated usage which may overwrite the
global tmp buffer.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug Something that looks like a bug but not yet confirmed
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants