-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect unsafe usage #220
Conversation
After checkptr fixes by 2fc6815, it was discovered that new issues were hit in production systems, in particular when a single process opened and updated multiple separate databases. This indicates that some bug relating to bad unsafe usage was introduced during this commit. This commit combines several attempts at fixing this new issue. For example, slices are once again created by slicing an array of "max allocation" elements, but this time with the cap set to the intended length. This operation is espressly permitted according to the Go wiki, so it should be preferred to type converting a reflect.SliceHeader.
} | ||
|
||
func unsafeByteSlice(base unsafe.Pointer, offset uintptr, i, j int) []byte { | ||
return (*[maxAllocSize]byte)(unsafeAdd(base, offset))[i:j:j] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to other reviewers:
The specific wiki page that permits this is: https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices
These aren't actually C arrays, but are arrays produced by mmap outside the heap so they can be treated as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had meant to comment this exact url here and forgot. Will update if desired.
One difference between here and that wiki is that this conversion allows the subslice to begin in the middle of the allocation instead of always at the start, or to subslice from the start of some address after adding an offset. I don't know if either of these is a concern, but I suspect not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! I think including the link (and a small description) as a comment would be extremely helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build is failing on make fmt
because of the blank comment line from this btw.
LGTM minus the comment formatting build fail. The change to remove I also successfully ran tests for a project I work on (Sia) with Go 1.14 and this using this branch for our bolt dependency. I've also been running that patched version of Sia for a few hours now with no issues. I'll report back if I run into anything of course. |
Thanks for this! bbolt still works for me on 486 and darwin hardware, but this bug now causes bbolt to segfault on raspberrypi arm (it used to work before I updated to the latest version of go) When I try your new branch on a memory constrained device (raspberrypi arm), it won't build because low memory devices do not have enough memory to create slices of size maxAllocSize: page.go:67:11: type [268435455]leafPageElement too large Is there a way of implementing the same fix without creating slices that are too large to fit in memory? |
Please also note that this branch builds correctly on the rpi (arm) if you revert the following two functions back (i.e. re-include the reflect.SliceHeader calls) in page.go: func (p *page) leafPageElements() []leafPageElement |
@imclaren It builds for me now when I perform the cross compile. Dividing the max alloc size by the size of the data type in the slice is enough (vs using a literal 0xFFFFFFF which older versions of bbolt used to presumably work around this) |
@jrick yes thanks this builds for me now and seems to be running without issues for my project with a bolt dependency. |
@jrick Thanks again for your work on this - I have been stress testing this branch and unfortunately it still gives me out of memory errors when we divide the max alloc size by the size of the data type. Again, bolt runs without problems if we revert the following two functions back (i.e. re-include the reflect.SliceHeader calls) in page.go: func (p *page) leafPageElements() []leafPageElement I think that any solution that refers to maxAllocSize rather than being limited to the actual size required will be problematic for low memory systems. |
Before you were getting a compile error? Now a runtime error? Can you post the full error and backtrace? |
@tmm and @gyuho thanks your assistance (and thanks again @jrick): If we revert back to @jrick's #e04f391 then page.go includes the following functions:
On raspberrypi arm, if I try to build an app that tries to create or open a bolt database (i.e. includes the following line):
Then I get the following build error: ../bbolt/page.go:70:11: type [268435455]leafPageElement too large I believe that this build error occurs because low memory devices do not have enough memory to create arrays of size maxAllocSize. In response to my comment in relation to this build problem, in #81f2578 @jrick changed the 2 functions in page.go as follows:
If we use #81f2578 on raspberrypi arm, then apps that create or open a bolt database (i.e. include the following line) build:
However, bolt is still allocating and using memory based on a proportion of maxAllocSize (i.e. the maximum memory available on the rpi) rather than based on the size of the byte slices to be cached. In other words, allocating memory based on maxAllocSize rather than the size of the byte slices means that bolt uses a lot more memory than the memory that is currently utilised in the bbolt master branch. The rpi runs out of memory (and my apps crash with out of memory errors) when I try to cache multiple small byte slices in parallel for workloads that ran fine using the main branch. Please note that I have also tested my app by replacing bbolt with badger and using badger's low memory options. I recognise that badger uses a completely different caching algorithm (an LSM tree) but badger also uses a lot less memory than this branch (i.e. #81f2578) when I try to cache multiple small byte slices in parallel. |
There is something amiss if that array slice is causing an issue. It should not be generating an actual array that large at runtime if the cgo documentation is to be believed. To be clear, this problem is only seen because of these two slices in page.go, and not all of the other places that are calling unsafeByteSlice? Perhaps the compiler only handles this case correctly when working on byte slices, not slices of other types, and we need to do an unsafe conversion between a byte slice to a properly-typed slice. |
@imclaren try this patch and let me know if anything improves. Hopefully this helps the compiler out a bit to generate the correct code we expect.
|
I'm looking at the object disassembly of those functions now and I'm not seeing that patch to make any noticeable difference to the compiler output. Using the current pull request branch:
Using the following patch:
And lastly further simplifying to replace unsafeIndex with unsafeAdd (because since it only needs index 0 after applying the page header offset, the other function is not needed):
Perhaps someone more well versed in arm assembly can spot something there. |
Yes, the build error only occurs in relation to the two functions in page.go. If you revert the two functions back (i.e. re-include the reflect.SliceHeader calls) in page.go then there are no build errors. My working assumption is that the following creates an array of size maxArraySize and then returns a slice of this array with capacity p.count, and the original array of size maxArraySize is retained (at least for a while before it is possibly garbage collected?):
I could be wrong though. I don't know how unsafe calls used to create pointers to arrays work. |
In other words, what may be happening is 'A possible "gotcha"' as described in https://blog.golang.org/slices-intro |
How are you measuring this? It is normal for VSS size to be large. Only RSS really matters, and with memory mapped files it can be hard to calculate actual memory usage. When you say running out of memory, what exactly is happening? Your golang program should be receiving an error like ENOMEM and should spit out a stack trace of where the error is received. How big is your bolt database file on disk? What model rpi are you using, and what does |
I am not able to reproduce this. Can you please provide more details?
|
Never mind. I had to revert the last commit first:
|
cc @aclements @ianlancetaylor maybe you know who can shed some light here? |
I'm pretty sure this is incorrect. A large array is not being created, rather only a pointer to such an array. |
@tmm1 are we saying the same thing in a different way? I assume that returning and retaining a pointer to an array (including by re-slicing a slice) means that the underlying original array cannot be garbage collected. See ‘A possible "gotcha"' as described in https://blog.golang.org/slices-intro |
There is no underlying original array in this case. Only a slice view into a memory mapped file. I don't think that gotcha applies. See https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices |
If you have an extremely large boltdb file on disk, then I'm guessing what you're running into is the 32-bit address space limitation caused by the You can for example run |
To be fair my use case is unusual. I am trying to cache the maximum number of byte slices as quickly as possible within the memory constraints of the rpi. I benchmarked by varying the size of the byte slices and tracking memory consumption using the main branch of bolt (using an old version of golang which didn’t trigger constant segfaults), with this branch, and with badger using low memory options. This branch used a lot more memory than the main branch of bolt and badger even when I varied the size of the byte slice. |
Oh and I tested starting with an empty bolt database |
reinterpreting an existing slice as a slice header to unsafely modify seems sane to me, but i would modify the cap before the length. it seems that a lot of bad things may happen between the statement that would otherwise set the length before the cap. edit: and the Data pointer before either |
Note that main uses reflect.SliceHeaders for these functions in page.go:
|
This helper feels like a fairly ergonomic way to use the reflect.SliceHeader :
Then in page.go for example it can be used with:
This keeps the correct usage of reflect.SliceHeader a guarantee as it must point to an existing correctly-typed slice, and is able to operate on slices of any element type. |
Is there anything left to do here? Has anyone stress-tested the latest changes to confirm the original issue is resolved? |
Hi @jrick and @tmm I have been testing these latest changes and they seem to work on the rpi without segfaults or obvious memory leaks. I note that if you run the new TestManyDBs test created by @jrick but put 25mb of random bytes rather than use 16 byte random keys, bolt seems to use about 1.5gb to 2gb of memory. To compare, I created a test that just puts each 25mb of random bytes to different filesystem files using the TestManyDBs test, and that uses about 250mb to 300mb of memory. I expect that this is a bolt design limitation not a bug. Let me know if you want me to post the code that uses the the TestManyDBs test to just save bytes to flat files. |
Can you share this, in case other people want to run similar workloads? |
@gyuho I have packaged this up as a golang low memory disk cache, which is available here: https://github.com/imclaren/calmcache You can just go get it and run the tests. |
- need fix from etcd-io/bbolt#214 and etcd-io/bbolt#220 for full go 1.14 compat
We had once updated bbolt from v1.3.3 to v1.3.4 in containerd#4134, but reverted to v1.3.3 in containerd#4156 due to "fatal error: sweep increased allocation count" (etcd-io/bbolt#214). The issue was fixed in bbolt v1.3.5 (etcd-io/bbolt#220). Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
We had once updated bbolt from v1.3.3 to v1.3.4 in containerd#4134, but reverted to v1.3.3 in containerd#4156 due to "fatal error: sweep increased allocation count" (etcd-io/bbolt#214). The issue was fixed in bbolt v1.3.5 (etcd-io/bbolt#220). Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
We had once updated bbolt from v1.3.3 to v1.3.4 in containerd#4134, but reverted to v1.3.3 in containerd#4156 due to "fatal error: sweep increased allocation count" (etcd-io/bbolt#214). The issue was fixed in bbolt v1.3.5 (etcd-io/bbolt#220). Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
We had once updated bbolt from v1.3.3 to v1.3.4 in containerd#4134, but reverted to v1.3.3 in containerd#4156 due to "fatal error: sweep increased allocation count" (etcd-io/bbolt#214). The issue was fixed in bbolt v1.3.5 (etcd-io/bbolt#220). Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> (cherry picked from commit bebfbab) Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
This unbreaks bbolt (as part of containerd) on 1.14+ (see etcd-io/bbolt#201 and etcd-io/bbolt#220), pulls in my patch to ignore image-defined volumes (containerd/cri#1504) and gets us some robustness fixes in containerd CNI/CRI integration (containerd/cri#1405). This also updates K8s at the same time since they share a lot of dependencies and only updating one is very annoying. On the K8s side we mostly get the standard stream of fixes plus some patches that are no longer necessary. One annoying on the K8s side (but with no impact to the functionality) are these messages in the logs of various components: ``` W0714 11:51:26.323590 1 warnings.go:67] policy/v1beta1 PodSecurityPolicy is deprecated in v1.22+, unavailable in v1.25+ ``` They are caused by KEP-1635, but there's not explanation why this gets logged so aggressively considering the operators cannot do anything about it. There's no newer version of PodSecurityPolicy and you are pretty much required to use it if you use RBAC. Test Plan: Covered by existing tests Bug: T753 X-Origin-Diff: phab/D597 GitOrigin-RevId: f6c447da1de037c27646f9ec9f45ebd5d6660ab0
- need fix from etcd-io/bbolt#214 and etcd-io/bbolt#220 for full go 1.14 compat
- need fix from etcd-io/bbolt#214 and etcd-io/bbolt#220 for full go 1.14 compat
- need fix from etcd-io/bbolt#214 and etcd-io/bbolt#220 for full go 1.14 compat
- need fix from etcd-io/bbolt#214 and etcd-io/bbolt#220 for full go 1.14 compat
- need fix from etcd-io/bbolt#214 and etcd-io/bbolt#220 for full go 1.14 compat
After checkptr fixes by 2fc6815, it was discovered that new issues
were hit in production systems, in particular when a single process
opened and updated multiple separate databases. This indicates that
some bug relating to bad unsafe usage was introduced during this
commit.
This commit combines several attempts at fixing this new issue. For
example, slices are once again created by slicing an array of "max
allocation" elements, but this time with the cap set to the intended
length. This operation is espressly permitted according to the Go
wiki, so it should be preferred to type converting a
reflect.SliceHeader.
Fixes #214.