Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal error: fault (segmentation violation) during Commit #202

Closed
leakybits opened this issue Feb 14, 2020 · 2 comments
Closed

fatal error: fault (segmentation violation) during Commit #202

leakybits opened this issue Feb 14, 2020 · 2 comments

Comments

@leakybits
Copy link

leakybits commented Feb 14, 2020

I know there are already multiple issues floating around with reports of this (#183, #188, #194), but instead of replying individually to each, I figured I would open a new one because I have code that reproduces the issue.

The issue is that the meta0 and meta1 pointers seem to point to memory bolt no longer owns. I suspect that if the mmapped region grows, leading to a memory reallocation (and invalidation of old pointers), while simultaneously another goroutine tries to call meta(), it crashes because some necessary locks are missing. But that's just a suspicion after initial looks at the code; I have no proof of that.

Crash (partial) stacktrace:

unexpected fault address 0x3c001044
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x3c001044 pc=0x8f3cd]
goroutine 79 [running]:
runtime.throw(0xcaab3, 0x5)
    /usr/local/Cellar/go@1.12/1.12.13/libexec/src/runtime/panic.go:617 +0x64 fp=0x17f49ea4 sp=0x17f49e90 pc=0x260b4
runtime.sigpanic()
    /usr/local/Cellar/go@1.12/1.12.13/libexec/src/runtime/signal_unix.go:397 +0x31c fp=0x17f49ebc sp=0x17f49ea4 pc=0x3698c
go.etcd.io/bbolt.(*DB).meta(0x11ca4140, 0x19)
    /Users/user/go/pkg/mod/go.etcd.io/bbolt@v1.3.3/db.go:901 +0x1d fp=0x17f49ed4 sp=0x17f49ebc pc=0x8f3cd
go.etcd.io/bbolt.(*DB).hasSyncedFreelist(...)
    /Users/user/go/pkg/mod/go.etcd.io/bbolt@v1.3.3/db.go:323
go.etcd.io/bbolt.(*Tx).rollback(0x1a4d0400)
    /Users/user/go/pkg/mod/go.etcd.io/bbolt@v1.3.3/tx.go:279 +0x64 fp=0x17f49eec sp=0x17f49ed4 pc=0x97c04
go.etcd.io/bbolt.(*Tx).Commit(0x1a4d0400, 0x0, 0x0)
    /Users/user/go/pkg/mod/go.etcd.io/bbolt@v1.3.3/tx.go:161 +0x45f fp=0x17f49f90 sp=0x17f49eec pc=0x976bf
go.etcd.io/bbolt.(*DB).Update(0x11ca4140, 0x17f49fc4, 0x0, 0x0)
    /Users/user/go/pkg/mod/go.etcd.io/bbolt@v1.3.3/db.go:701 +0xbf fp=0x17f49fb4 sp=0x17f49f90 pc=0x8f36f
main.boltTest.func2(0x11ca4140, 0x40, 0x1000000, 0x4, 0xa)
    /Users/user/Code/CrashBolt/main.go:56 +0x65 fp=0x17f49fd8 sp=0x17f49fb4 pc=0x9b5a5
runtime.goexit()

I can only seem to reproduce this crash when the test app is built using GOARCH=386, but OS seems irrelevant (reproduced as a 32bit binary on windows, mac and linux). Crash can be reproduced reliably on 4 different machines so far (when built as 32bit) on 3 different OS.

Code (simple app) that reproduces the issue:

package main

import (
	"errors"
	"fmt"
	"math/rand"
	"strconv"
	"sync"

	bolt "go.etcd.io/bbolt"
)

const (
	MaxKeySize     = 1 << 6
	MaxValBytes    = 1 << 24
	NumWorkers     = 1 << 4
	RollbackChance = 1 << 2
	NumDBs         = 1 << 2
)

func main() {
	wg := sync.WaitGroup{}
	for db := 0; db < NumDBs; db++ {
		wg.Add(1)
		go boltTest(db, MaxKeySize, MaxValBytes, NumWorkers, RollbackChance)
	}

	wg.Wait()
}

func boltTest(testID, maxKeySize, maxValBytes, numWorkers, rollbackChance int) {
	dbName := fmt.Sprintf("test-%v.db", testID)

	fmt.Printf("Starting test %v\n\tnum keys: \t%v\n\tvalue size: \t%v\n\tworkers: \t%v\n\trollback: \t1/%v\n",
		testID, maxKeySize, maxValBytes, numWorkers, rollbackChance)

	db, err := bolt.Open(dbName, 0666, nil)
	if err != nil {
		panic(err)
	}
	defer db.Close()

	db.Update(func(tx *bolt.Tx) error {
		_, err := tx.CreateBucket([]byte("bucket"))
		return err
	})

	wg := sync.WaitGroup{}
	for workerID := 0; workerID < numWorkers; workerID++ {
		wg.Add(1)
		go func() {
			for {
				db.Update(func(tx *bolt.Tx) error {
					var err error

					b := tx.Bucket([]byte("bucket"))
					key := []byte(strconv.Itoa(rand.Intn(maxKeySize)))

					v := b.Get(key)
					if v != nil {
						err = b.Delete(key)
					} else {
						err = b.Put(key, make([]byte, rand.Intn(maxValBytes)))
					}

					if err != nil {
						panic(err)
					}

					if rand.Intn(rollbackChance) == 0 {
						err = errors.New("force rollback")
					}

					return err
				})
			}
		}()
	}

	wg.Wait()
}
@jrick
Copy link
Contributor

jrick commented Feb 15, 2020

This does look like a bug due to the 32-bit address space. Same thing was observed when testing my #201 PR in an i386 VM.

@ahrtr
Copy link
Member

ahrtr commented Jan 13, 2023

Fixed in #362.

Please anyone let me know if you still see the panic.

@ahrtr ahrtr closed this as completed Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants