Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slice appends sometimes have some of their content zeroed out in between tx calls #1167

Closed
thehowl opened this issue Sep 23, 2023 · 5 comments
Assignees
Labels
🐞 bug Something isn't working 📦 🤖 gnovm Issues or PRs gnovm related

Comments

@thehowl
Copy link
Member

thehowl commented Sep 23, 2023

Reproducible test: 0e44fca

Without the workaround, test output:

$ cd gno.land
$ go test -v -p 1 -run '/issue_1167' -timeout=30m ./cmd/gnoland
=== RUN   TestTestdata
=== RUN   TestTestdata/issue-1167
=== PAUSE TestTestdata/issue-1167
=== CONT  TestTestdata/issue-1167
    testscript.go:534: WORK=$WORK
        PATH=[...]
        GOTRACEBACK=system
        HOME=/no-home
        TMPDIR=$WORK/.tmp
        devnull=/dev/null
        /=/
        :=:
        $=$
        GOCOVERDIR=
        exe=
        USER_SEED_test1=source bonus chronic canvas draft south burst lottery vacant surface solve popular case indicate oppose farm nothing bullet exhibit title speed wink action roast
        USER_ADDR_test1=g1jg8mtutu9khhfwc4nxmuhcpftf0pajdhfvsqf5
        GNOROOT=/home/howl/oc/gno
        GNOHOME=/home/howl/throw/tmp/TestTestdata2981033143/001/gno

        # Reproducible Test for https://github.com/gnolang/gno/issues/1167 (3.436s)
        > gnoland start
        starting logger: "/home/howl/throw/tmp/tmp.9A1tFRif12/gnoland-c719f008.log"
        [stdout]
        node started successfully

        # add contract (0.425s)
        > gnokey maketx addpkg -pkgdir $WORK -pkgpath gno.land/r/demo/xx -gas-fee 1000000ugnot -gas-wanted 2000000 -broadcast -chainid=tendermint_test test1
        [stdout]

        OK!
        GAS WANTED: 2000000
        GAS USED:   416224

        [stderr]
        Enter password.

        > stdout OK!
        # execute New (0.435s)
        > gnokey maketx call -pkgpath gno.land/r/demo/xx -func New -args X -gas-fee 1000000ugnot -gas-wanted 2000000 -broadcast -chainid=tendermint_test test1
        [stdout]
        ("" string)
        OK!
        GAS WANTED: 2000000
        GAS USED:   260345

        [stderr]
        Enter password.

        > stdout OK!
        # execute Delta for the first time (0.511s)
        > gnokey maketx call -pkgpath gno.land/r/demo/xx -func Delta -args X -gas-fee 1000000ugnot -gas-wanted 2000000 -broadcast -chainid=tendermint_test test1
        [stdout]
        ("1,1,1;" string)
        OK!
        GAS WANTED: 2000000
        GAS USED:   307608

        [stderr]
        Enter password.

        > stdout OK!
        > stdout '"1,1,1;" string'
        # execute Delta for the second time (0.432s)
        > gnokey maketx call -pkgpath gno.land/r/demo/xx -func Delta -args X -gas-fee 1000000ugnot -gas-wanted 2000000 -broadcast -chainid=tendermint_test test1
        [stdout]
        ("0,0,0;2,2,2;" string)
        OK!
        GAS WANTED: 2000000
        GAS USED:   309514

        [stderr]
        Enter password.

        > stdout OK!
        > stdout '1;2,2,2;" string'
        FAIL: testdata/issue-1167.txtar:21: no match for `1;2,2,2;" string` found in stdout

--- FAIL: TestTestdata (0.00s)
    --- FAIL: TestTestdata/issue-1167 (5.61s)

This was originally found in GnoChess: one of the head-banging-against-the-wall issues we had with GnoChess was that the realm was returning a1a1 as the first move being made (a1a1 is a representation of a Move struct, where if all of its three values are 0 then the String() method will return a1a1). This could not be replicated using Gno tests, where the behaviour didn't happen, but only with the live chain.

Introducing this change fixed the problem. The underlying idea is that of re-creating and reallocating the moves slice, instead of copying it from the previous instance of Position.

@thehowl thehowl added 🐞 bug Something isn't working investigating This behavior is still being tested out 📦 🤖 gnovm Issues or PRs gnovm related labels Sep 23, 2023
@thehowl thehowl added this to the 🚀 main.gno.land (required) milestone Sep 23, 2023
@clockworkgr
Copy link

Possibly related: #1170

something-something-slice-allocation

@moul moul moved this to 🚀 Needed for Launch in 🚀 The Launch [DEPRECATED] Oct 4, 2023
@deelawn deelawn self-assigned this Oct 24, 2023
tbruyelle added a commit to tbruyelle/gno that referenced this issue Nov 1, 2023
So I think we can state that the bugs we have in gnolang#960, gnolang#1167 and gnolang#1170,
are all related to slice storage when its capacity is different than its
length.

@deelawn found a great way to overcome that bug, but the bug is still
there, somewhere in the VM code I think. I spent the last couple of days
trying to find it, unfortunately without success.

That said, I found a workaround, that could be also applied: when a
slice is stored, ignore any capacity different than the slice's length.

I think this is a good workaround because its one-line and because we
don't really care about storing slice with capacity higher than their
length (unless I'm missing something).
thehowl added a commit that referenced this issue Nov 2, 2023
@thehowl thehowl changed the title [needs investigation] an appended slice had its underlying contents zeroed out in between tx calls slice appends sometimes have some of their content zeroed out in between tx calls Nov 2, 2023
@thehowl
Copy link
Member Author

thehowl commented Nov 2, 2023

cc/ @deelawn

@thehowl thehowl removed the investigating This behavior is still being tested out label Nov 2, 2023
@deelawn
Copy link
Contributor

deelawn commented Nov 7, 2023

I've done an analysis on this and was able to identify the root cause. I'll start from the call to clone() and work my way up the stack.

Inside Position.clone():

mv := p.Moves -- The rhs already exists as a realm object. The lhs is local and is NOT marked for realm persistence. The value is used to compose a new instance of Position that is returned.

Inside Position.update():

p := oldp.clone() -- The rhs has just been initialized so the assignment to the new variable p results in another object NOT marked for realm persistence, even though the embedded and persisted Moves object exists within it.

p.Moves = append(p.Moves, Move{counter, counter, counter}) -- A new Move instance is appended to the existing and persistent underlying array value. It is within the defined capacity so no resizing occurs. The lhs variable is not marked for persistence because the parent/base, p, is not yet a persistent object.

Inside Delta(string):

n := g.Position.update() -- a wash in regards to marking anything for persistence; n is not persistent

g.Position = n -- the lhs is persistent and being assigned to a newly created instance of Position, so it is marked as newReal in order to be persisted

And then...

When it's time to finalize the transaction and handle CRUD operations for realm persistence objects, the realm identifies there is a newly created instance of Position. It creates and and proceeds to recurse over its children. BUT the only child to process is the existing []Move instance. Now examine this block of code:

child.IncRefCount()
rc := child.GetRefCount()
if rc == 1 {
if child.GetIsReal() {
// a deleted real became undeleted.
child.SetOwner(oo)
rlm.MarkDirty(child)
} else {
// a (possibly pre-existing) new object
// became real (again).
// NOTE: may already be marked for first gen
// newCreated or updated.
child.SetOwner(oo)
rlm.incRefCreatedDescendants(store, child)
child.SetIsNewReal(true)
}
} else if rc > 1 {
if child.GetIsEscaped() {
// already escaped, do nothing.
} else {
// NOTE: do not unset owner here,
// may become unescaped later
// in processNewEscapedMarks().
// NOTE: may already be escaped.
rlm.MarkNewEscaped(child)
}
} else {

Note that child is the existing []Move instance here. It already has an existing reference so after incrementing it again, rc becomes 2 and the necessary block to persist the new Move object is skipped. Even if the reference count were correct, this instance of []Move is currently a real object, so it would be marked as dirty and the newly value would not be persisted.

So there are two things that need to be addressed here:

  1. How to handle decrementing a reference to an object when it is reassigned and its parent object is overwritten
  2. How to determine if a real object may be composed of new real objects that could be two or more layers deep due to transitive assignments

I'll dig into this more and come up with one or more proposals for how to fix these issues.

@jaekwon
Copy link
Contributor

jaekwon commented Nov 11, 2023

looking at this now.

jaekwon added a commit that referenced this issue Jan 4, 2024
Addresses #1167,  #960, and #1170 

Consider the following situation:
- A slice of structs exists with a length of zero and a capacity of one
- A new struct literal is appended to the slice
- The code panics because the newly allocated struct literal was never
marked as "new"

``` go
package append

import (
	"gno.land/p/demo/ufmt"
)

type T struct{ i int }

var a []T

func init() {
        a = make([]T, 0, 1)
}

func Append(i int) {
	a = append(a, T{i: i})
}

```

Invoking the `Append` function will cause a panic.

The solution is to traverse each of the array elements after slice
append assignment to make sure any new or updated elements are marked as
such.

This PR also includes a change to ensure that marking an object as dirty
and managing references to the object are mutually exclusive. I think
this is correct but am not sure.

The changes include txtar test cases that incorporate the issue
described by @tbruyelle in #1170

---------

Co-authored-by: jaekwon <jae@tendermint.com>
@deelawn
Copy link
Contributor

deelawn commented Jan 10, 2024

Fixed by #1305

@deelawn deelawn closed this as completed Jan 10, 2024
gfanton pushed a commit to moul/gno that referenced this issue Jan 18, 2024
…lang#1305)

Addresses gnolang#1167,  gnolang#960, and gnolang#1170 

Consider the following situation:
- A slice of structs exists with a length of zero and a capacity of one
- A new struct literal is appended to the slice
- The code panics because the newly allocated struct literal was never
marked as "new"

``` go
package append

import (
	"gno.land/p/demo/ufmt"
)

type T struct{ i int }

var a []T

func init() {
        a = make([]T, 0, 1)
}

func Append(i int) {
	a = append(a, T{i: i})
}

```

Invoking the `Append` function will cause a panic.

The solution is to traverse each of the array elements after slice
append assignment to make sure any new or updated elements are marked as
such.

This PR also includes a change to ensure that marking an object as dirty
and managing references to the object are mutually exclusive. I think
this is correct but am not sure.

The changes include txtar test cases that incorporate the issue
described by @tbruyelle in gnolang#1170

---------

Co-authored-by: jaekwon <jae@tendermint.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working 📦 🤖 gnovm Issues or PRs gnovm related
Projects
Status: 🚀 Needed for Launch
Development

No branches or pull requests

4 participants