diff --git a/cgo_libc_static/README.md b/cgo_libc_static/README.md
index 558c589..7e0c8d1 100644
--- a/cgo_libc_static/README.md
+++ b/cgo_libc_static/README.md
@@ -1,8 +1,17 @@
-# A Bug's Life
+# Static Cgo Builds, What Could Go Wrong?
-*The first computer bugs were found by [cleaning out mechanical parts](https://upload.wikimedia.org/wikipedia/commons/8/8a/H96566k.jpg). The bug described below unfortunately couldn't be tracked down in such a straightforward fashion. But the discovery story is more interesting than "I looked into hundreds of relays" and goes way down the rabbit hole as we tag along.*
+*The first computer bugs were found by [cleaning out mechanical
+parts](https://upload.wikimedia.org/wikipedia/commons/8/8a/H96566k.jpg). The
+bug described below unfortunately couldn't be tracked down in such a
+straightforward fashion. But the discovery story is more interesting than "I
+looked into hundreds of relays" and goes way down the rabbit hole as we revisit
+a dozen hours' worth of debugging at [Cockroach Labs](http://cockroachlabs.com).
+We'll re-emerge with a lesson about static linking and cgo.*
-A couple of days ago, my colleague [@tamird](https://github.com/tamird) opened issue [#13470](https://github.com/golang/go/issues/13470) against [golang/go](https://github.com/golang/go). In it, he gives the following snippet:
+A couple of days ago, my colleague [@tamird](https://github.com/tamird) opened
+issue [#13470](https://github.com/golang/go/issues/13470) against
+[golang/go](https://github.com/golang/go). In it, he gives the following
+snippet (if you want to follow along, I've prepared a [Docker image](#fn_1)):
```go
package main
@@ -11,7 +20,7 @@ import (
"net"
"os/user"
- "C" // required since we want a static binary
+ "C" // enable cgo for static build
)
func main() {
@@ -22,87 +31,77 @@ func main() {
}
```
-Looks about as innocuous as nonsensical, right? If we run it naively, nothing happens:
+Looks about as innocuous as it is nonsensical, right? If we run it naively,
+nothing happens:
```bash
$ go run main.go
```
-But of course the `C` import above hints at trying a static build instead . Let's try it1:
+But of course the `C` import above hints at trying a static build instead.
+Let's do that:
```
-# This is just how you build and run statically in Go
$ go run -ldflags '-extldflags "-static"' main.go
fatal error: unexpected signal during runtime execution
[signal 0xb code=0x1 addr=0xe5 pc=0x7fec267f8a5c]
-runtime stack:
-runtime.throw(0x660380, 0x2a)
- /usr/local/go/src/runtime/panic.go:527 +0x90
-runtime.sigpanic()
- /usr/local/go/src/runtime/sigpanic_unix.go:12 +0x5a
-
goroutine 1 [syscall, locked to thread]:
runtime.cgocall(0x402620, 0xc82004bd30, 0xc800000000)
/usr/local/go/src/runtime/cgocall.go:120 +0x11b fp=0xc82004bce0 sp=0xc82004bcb0
-os/user._Cfunc_mygetpwuid_r(0x0, 0xc8200172c0, 0x7fec180008c0, 0x400, 0xc82002a0b0, 0x0)
- ??:0 +0x39 fp=0xc82004bd30 sp=0xc82004bce0
os/user.lookupUnix(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/os/user/lookup_unix.go:99 +0x723 fp=0xc82004bea0 sp=0xc82004bd30
-os/user.current(0x0, 0x0, 0x0)
- /usr/local/go/src/os/user/lookup_unix.go:39 +0x42 fp=0xc82004bee0 sp=0xc82004bea0
os/user.Current(0x62eba8, 0x0, 0x0)
/usr/local/go/src/os/user/lookup.go:9 +0x24 fp=0xc82004bf00 sp=0xc82004bee0
-main.main()
- /go/src/github.com/cockroachdb/cgo_static_boom/main.go:13 +0x55 fp=0xc82004bf50 sp=0xc82004bf00
-runtime.main()
- /usr/local/go/src/runtime/proc.go:111 +0x2b0 fp=0xc82004bfa0 sp=0xc82004bf50
-runtime.goexit()
- /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc82004bfa8 sp=0xc82004bfa0
-
-goroutine 17 [syscall, locked to thread]:
-runtime.goexit()
- /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1
-exit status 2
+[...]
```
Jeez, what just happened here?
-First of all, this is obviously a panic. But it's not a panic from Go-land, it's a segfault (`signal 0xb` is signal `11`, i.e. a segfault) from from a [cgo library call](https://github.com/golang/go/blob/cb867d2fd64adc851f82be3c6eb6e38ec008930b/src/os/user/lookup_unix.go#L77) to `getpwuid_r`, which belongs to `glibc`.
+This is obviously a panic. But it's not a panic from Go-land, it's a segfault
+(`signal 0xb` is signal `11=SIGSEGV`) from within a [cgo call](https://github.com/golang/go/blob/cb867d2fd64adc851f82be3c6eb6e38ec008930b/src/os/user/lookup_unix.go#L77)
+to `getpwuid_r`, which belongs to `glibc`.
-```C
-static int mygetpwuid_r(int uid, struct passwd *pwd,
- char *buf, size_t buflen, struct passwd **result) {
- return getpwuid_r(uid, pwd, buf, buflen, result);
-}
-```
-
-Versed users of cgo and static builds will know that if you call out to `glibc` in your code (be it directly or through dependencies), your "static" binary will still need the exact version of `glibc` available at runtime to work correctly. In fact, if you add `-v` to the `-ldflags` parameter, we get warnings:
+Versed users of cgo and static builds will know that if you call out to `glibc`
+in your code (be it directly or through dependencies), your "static" binary
+will still need the exact version of `glibc` available at runtime to work
+correctly. In fact, if you add `-v` to the `-ldflags` parameter, we get
+warnings:
```
[...]
-/tmp/go-link-359142278/000002.o: In function `mygetpwnam_r':
-/tmp/workdir/go/src/os/user/lookup_unix.go:33: warning: Using 'getpwnam_r' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
-/tmp/go-link-359142278/000002.o: In function `mygetpwuid_r':
-/tmp/workdir/go/src/os/user/lookup_unix.go:28: warning: Using 'getpwuid_r' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
-/tmp/go-link-359142278/000003.o: In function `_cgo_709c8d94a9f9_C2func_getaddrinfo':
-/tmp/workdir/go/src/net/cgo_unix.go:55: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
+/tmp/.../000002.o: In function `mygetpwuid_r':
+/tmp/.../os/user/lookup_unix.go:28: warning: Using 'getpwuid_r' in statically
+ linked applications requires at runtime the shared libraries from the glibc
+ version used for linking
[...]
```
-But, in our example this should be the case (after all, we're using `go run` directly - we're not building a binary on one system and then putting it in a new place). So - this should work!
+But we used `go run` directly and didn't move the binary around or change our
+glibc. So this **should** work!
-Secondly, it's the call to `user.Current()` which crashes the program. What's the role of `net.Dial()`? Well, the big surprise is that you need that call or the program turns boring again. Same for the loop. Remove it and voila, no error. So this isn't a simple case of a call failing, it's a weird concoction of random things that reproduce this error.
+In the test case, it's the call to `user.Current()` which crashes the program.
+But what's the role of the call `net.Dial()` before that? Well, the big
+surprise is that without that call, the program does not crash. Same for the
+loop. Remove it and voila, no error. So this isn't a simple case of a call
+failing, it's a weird concoction of ingredients producing this error.
-How would one even come up with this? Surely this isn't straight from our codebase?
-Good news! I'm going to walk you all the way through, from the first high-level
-failure to an ending which is only happy when considering the bug's perspective.
+Still interested? It's going to get technical. You can go [straight to the
+conclusion](#conclusion), but if you stick along I'll walk you all the way through,
+from the first high level failure over hours of debugging to, fortunately, an
+ending.
-[1] on OSX, static builds basically don't work. But you can follow along using `docker -ti cockroachdb/builder`.
+[1]: [Dockerfile here](https://github.com/tschottdorf/goplay/blob/master/issue_13470/Dockerfile); invoke via `build -t gdb . && docker run -ti gdb`.
-## Birth
+## Discovery
-Proud mother of the little critter is [cockroachdb/cockroach#3310](https://github.com/cockroachdb/cockroach/pull/3310). Basically, [@tamird](https://github.com/tamird) was building a static test binary with the goal of running it during nightly builds. The test uses [lib/pq](https://github.com/lib/pq) to connect to a [Cockroach DB cluster](http://www.cockroachlabs.com) (which essentially speaks Postgres' SQL dialect). You already know what happened when he tried to run it:
+This bug hit us out of the blue in
+[cockroachdb/cockroach#3310](https://github.com/cockroachdb/cockroach/pull/3310).
+Basically, [@tamird](https://github.com/tamird) was building a static test
+binary with the goal of running it during nightly builds. The test uses
+[lib/pq](https://github.com/lib/pq) to connect to a [Cockroach DB
+cluster](http://github.com/cockroachdb/cockroach) (which essentially speaks
+Postgres' wire protocol). You already know what happened when he tried to run it:
```
fatal error: unexpected signal during runtime execution
@@ -113,24 +112,27 @@ runtime.cgocall(0x44c7f0, 0xc82036a8d8, 0xc800000000)
/usr/local/go/src/runtime/cgocall.go:120 +0x11b fp=0xc82036a888 sp=0xc82036a858
os/user._Cfunc_mygetpwuid_r(0x0, 0xc8203a8390, 0x7f3c5c000a10, 0x400, 0xc8200e4058, 0x7f3c00000000)
??:0 +0x39 fp=0xc82036a8d8 sp=0xc82036a888
-os/user.lookupUnix(0x0, 0x0, 0x0, 0xc82017fb00, 0x0, 0x0, 0x0)
- /usr/local/go/src/os/user/lookup_unix.go:99 +0x723 fp=0xc82036aa48 sp=0xc82036a8d8
-os/user.current(0xc82036aab8, 0x0, 0x0)
- /usr/local/go/src/os/user/lookup_unix.go:39 +0x42 fp=0xc82036aa88 sp=0xc82036aa48
+[...]
os/user.Current(0x13ce800, 0x0, 0x0)
/usr/local/go/src/os/user/lookup.go:9 +0x24 fp=0xc82036aaa8 sp=0xc82036aa88
github.com/lib/pq.(*conn).setupSSLClientCertificates(0xc8201c9180, 0xc8202b2f00, 0xc82036b3d8)
/go/src/github.com/lib/pq/conn.go:983 +0x478 fp=0xc82036ad40 sp=0xc82036aaa8
-...
+[...]
```
-I had [dabbled/fought with cgo and static builds before](http://tschottdorf.github.io/linking-golang-go-statically-cgo-testing/), and I had never seen it crash like that (even deliberately using glibc and putting the static binary in a busybox without it gave me "sane" errors back), so I was intrigued and we went down the rabbit hole together.
+I had [dabbled/fought with static cgo builds
+before](http://tschottdorf.github.io/linking-golang-go-statically-cgo-testing/)
+and had never seen it crash like that (even when trying), so I was intrigued
+and we went down the rabbit hole together.
## First Steps
-[`lib/pq/conn.go:984`](https://github.com/lib/pq/blob/11fc39a580a008f1f39bb3d11d984fb34ed778d9/conn.go#L983) is where the fatal call to `user.Current()` takes place. Leaving out a lot of code, this is roughly what the callpath to it looks like:
+[`lib/pq/conn.go:984`](https://github.com/lib/pq/blob/11fc39a580a008f1f39bb3d11d984fb34ed778d9/conn.go#L983)
+is where the fatal call to `user.Current()` takes place. Leaving out a lot of
+code, this is roughly what the callpath to it looks like:
+
```go
func DialOpen(d Dialer, name string) (_ driver.Conn, err error) {
// ...
@@ -152,21 +154,37 @@ func (cn *conn) ssl(o values) {
}
```
-It's relatively easy to guess this in this heavily truncated version, but there's actually a successful call to `user.Current()` from `userCurrent()` (marked with `!!!`). We only saw this after adding an `fmt.Println()` in `user.Current()` and wondered why that printed more than we expected. So, that's weird - the crash is either random2 or it depends on something else happening before it.
+It's relatively easy to guess this in this heavily truncated version, but
+there's actually a successful call to `user.Current()` from `userCurrent()`
+(marked with `!!!`). We only saw this after adding an `fmt.Println()` in
+`user.Current()` and wondered why that printed more than we expected. So,
+that's weird - the crash is either random or it depends on
+something else happening before it.
-[2]: our test can actually fail on the "first" invocation as well and there is some randomness involved, but I haven't double-checked whether that's due to a retry, so I'm simplifying here.
+## Reduction
-## Failed Metamorphosis
+The first step in such a scenario is always reduction: someone else will likely
+have to help you, and they shouldn't have to wade through boatloads of
+unrelated code.
-If I were to make a lame attempt to draw an entomological comparison (well, looks like we're already there), at this point we'd hope for our ugly critter (*and, for the record, a critter in this post is always the bug but not [Cockroach](https://www.cockroachdb.org), both a [badass insect](http://www.pestworld.org/news-and-views/pest-articles/articles/fascinating-cockroach-facts/) and a [NewSQL DB](https://en.wikipedia.org/wiki/NewSQL)*) to undergo a transformation into something prettier - a minimal exploding example without all of the dependencies of this integration test. If we wanted someone to debug this mess, they'd take a while to even know where the interesting bits happen.
+Unfortunately, straightforward attempts to reproduce the crash proved
+difficult. A bunch of calls to `user.Current()` in a static binary? Works.
+Rewriting it as a test? Works. Maybe the calls to `user.Current()` need to be
+in a proxy (or double-proxy) package? Works.
-Unfortunately, we couldn't reproduce it in a minimal setting. A bunch of calls to `user.Current()` in a static binary? Works. Rewriting it as a test? Works. Maybe the calls to `user.Current()` need to be in a proxy (or double-proxy) package? Works. We couldn't figure it out (and `cgo` has some nooks and crannies of its own - if you don't have an `import "C"`, you may end up with a dynamically linked executable regardless, and there are some funny interactions with referenced packages which use `cgo` themselves).
+We couldn't figure it out but at least managed to strip a lot of code by
+experimentation. What we ended up with was a test that did nothing but open a
+`lib/pq` connection, triggering the same panic. Better than nothing.
-But, we managed to at least remove a lot of the irrelevant code and end up with a test that did nothing but open a `lib/pq` connection, triggering the same panic. Better than nothing.
+Now we were in the position to quickly iterate and try to close the
+gap between the two invocations of `user.Current()`. Remember, the bug is
-## Scratching The Itch
+1. call `user.Current()`
+1. something else happens
+1. explode at `user.Current()`.
-The last step put us in the position to quickly iterate and try to close the gap between the two invocations of `user.Current()`. Again, fairly easy to see in the distilled version above - there's exactly one relevant call between the two3:
+It is fairly easy to see in the [distilled version above](#dialopen) that there
+is exactly one relevant call between the two invocations[2](#fn_2):
```go
user.Current()
@@ -175,33 +193,21 @@ cn.c, err = dial(d, o)
user.Current()
```
-Now it's time for a binary search - hop down into `dial`, insert calls to `user.Current()` in a bunch of locations, run the binary, find the location which crashed and iterate. The hypothesis at this point is that somehow, a previous syscall corrupts *something* for the syscall in `user.Current()`, and that we want to figure out the specific syscall that does it.
+Now it's time for a binary search - hop down into `dial`, insert calls to
+`user.Current()` in a bunch of locations, run the binary, find the location
+which crashed and iterate. The hypothesis at this point is that somehow, a
+previous syscall corrupts *something* for the syscall in `user.Current()`, and
+that we want to figure out the specific syscall that does it.
-Sounds tedious? Well, it was. The callpath we eventually figured out is (using `user.Current()` hits import path conflict bedrock at some point):
+Sounds tedious? Well, it was. The callpath we eventually figured out is (using
+`user.Current()` hits import path conflict bedrock at some point):
```
/usr/local/go/src/net/fd_unix.go:118 (0xbdedd9)
(*netFD).connect: debug.PrintStack() // inserted for testing
/usr/local/go/src/net/sock_posix.go:137
(*netFD).dial: if err := fd.connect(lsa, rsa, deadline); err != nil {
-/usr/local/go/src/net/sock_posix.go:89
- socket: if err := fd.dial(laddr, raddr, deadline); err != nil {
-/usr/local/go/src/net/ipsock_posix.go:160
- internetSocket: return socket(net, family, sotype, proto, ipv6only, laddr, raddr, deadline)
-/usr/local/go/src/net/tcpsock_posix.go:171
- dialTCP: fd, err := internetSocket(net, laddr, raddr, deadline, syscall.SOCK_STREAM, 0, "dial")
-/usr/local/go/src/net/dial.go:364
- dialSingle: c, err = testHookDialTCP(ctx.network, la, ra, deadline)
-/usr/local/go/src/net/dial.go:336
- dialSerial.func1: return dialSingle(ctx, ra, d)
-/usr/local/go/src/net/fd_unix.go:41
- dial: return dialer(deadline)
-/usr/local/go/src/net/dial.go:338
- dialSerial: c, err := dial(ctx.network, ra, dialer, partialDeadline)
-/usr/local/go/src/net/dial.go:232
- (*Dialer).Dial: c, err = dialSerial(ctx, primaries, nil)
-/usr/local/go/src/crypto/tls/tls.go:115
- DialWithDialer: rawConn, err := dialer.Dial(network, addr)
+# 9 stack frames omitted...
/go/src/github.com/lib/pq/conn.go:88
defaultDialer.Dial: return net.Dial(ntw, addr)
/go/src/github.com/lib/pq/conn.go:279
@@ -210,7 +216,8 @@ Sounds tedious? Well, it was. The callpath we eventually figured out is (using `
DialOpen: cn.c, err = dial(d, o)
```
-and we now have the following example, which requires a patch to the standard library but is good enough for someone else to investigate:
+and we now have the following example, which requires a patch to the standard
+library but is good enough for someone else to investigate:
```go
// boom_test.go
@@ -228,7 +235,8 @@ func TestBoom(t *testing.T) {
t.Fatalf("conn: %s, err: %s", conn, err)
}
-// cgo.go - without this, don't get a static binary no matter what
+// cgo.go - without this, we don't get a static binary.
+// Presumably we could run with CGO_ENABLED=1 instead.
package cgo_static_boom
import "C"
@@ -244,17 +252,28 @@ and the following patch to `$(go env GOROOT)/src/net/fd_unix.go`:
+ user.Current()
```
-[3] of course, all the irrelevant calls are omitted here - we're already hours into the game at this point.
+I was happy with this and stepped out for dinner, but
+[@tamird](https://github.com/tamird) kept drilling to get rid of the stdlib
+patch. He threw together `net.Dial()` and `user.Current()` in the loop (to
+account for randomness), figured out that the test setup wasn't needed and
+must've been delighted to arrive at the example at the beginning of this post.
-## Metamorphosis
+[2]: of course, all the irrelevant calls are omitted here -
+we're already hours into the game at this point.
-I was happy with this and stepped out for early dinner, but [@tamird](https://github.com/tamird) kept drilling to get rid of the stdlib patch. He threw together `net.Dial()` and `user.Current()` in the loop (to account for randomness), figured out that the test setup wasn't needed and must've been delighted to arrive at the example at the beginning of this post.
+## (Dis)Assembling the troops
-## Pest Control
-
-Fast-forward four days, two dozen comments and one closed issue [#13470](https://github.com/golang/go/issues/13470) later, we're a little wiser. After some back and forth on [#13470](https://github.com/golang/go/issues/13470) about glibc versions and `LD_PRELOAD`, [@mwhudson](https://github.com/mwhudson) posted some interesting findings. To trace what he did, we're going to leave Go-land completely - we're seeing a segfault from a library call, so that's where our debugging has to take place. Time to dust off `gdb`4!
+Fast-forward four days, two dozen comments and one closed issue
+[golang/go#13470](https://github.com/golang/go/issues/13470) later, we're a
+little wiser. After some back and forth on
+[#13470](https://github.com/golang/go/issues/13470) about glibc versions and
+`LD_PRELOAD`, [@mwhudson](https://github.com/mwhudson) posted some interesting
+findings. To trace what he did, we're going to leave Go-land completely - we're
+seeing a segfault from a library call, so that's where our debugging has to
+take place. Time to dust off `gdb`[3](#fn_3)!
```
+$ gdb ./boom
(gdb) run
Starting program: /go/src/github.com/tschottdorf/goplay/issue_13470/boom
[Thread debugging using libthread_db enabled]
@@ -273,8 +292,10 @@ warning: Source file is more recent than executable.
961 while (isspace (*p))
```
-This gives us a location in the code (`nss_compat/compat-pwd.c:961`) but it's easy to
-see that it doesn't really matter. `*p` is not the culprit (if it were, we'd see `0x0` and not `0x5e` as the illegal memory access) and in fact looking at the assembly code we see
+This gives us a location in the code (`nss_compat/compat-pwd.c:961`) but it's
+easy to see that it doesn't really matter. `*p` is not the culprit (if it were,
+we'd see `0x0` and not `0x5e` as the illegal memory access) and in fact looking
+at the assembly code we see
```
(gdb) disas
@@ -306,7 +327,8 @@ and `0x1(%rcx,%rdx,2) = 0x1 + %rcx + 2*%rdx = 0x1 + 2*0x72 = 0x5e`. Clearly
we're looking at the right code here, and it's odd that `%rcx` would be zero
since `__ctype_b_loc` [should](https://refspecs.linuxfoundation.org/LSB_3.0.0/LSB-PDA/LSB-PDA/baselib---ctype-b-loc.html)
-> [...] return a pointer into an array of characters in the current locale that contains characteristics for each character in the current character set.
+> [...] return a pointer into an array of characters in the current locale that
+> contains characteristics for each character in the current character set.
That's clearly not what it did here. Let's look at its code:
@@ -321,7 +343,9 @@ $ objdump -D ./boom | grep -A 10 __ctype_b_loc
```
Whatever happens here, the `%fs` register is involved, and it [appears that this
-register plays a role in thread-local storage](http://stackoverflow.com/questions/6611346/how-are-the-fs-gs-registers-used-in-linux-amd64). Knowing that, we set a breakpoint just before the crash and investigate the registers, while also keeping an eye on thread context switches:
+register plays a role in thread-local storage](http://stackoverflow.com/questions/6611346/how-are-the-fs-gs-registers-used-in-linux-amd64).
+Knowing that, we set a breakpoint just before the crash and investigate the
+registers, while also keeping an eye on thread context switches:
```
(gdb) br nss_compat/compat-pwd.c:961
@@ -337,7 +361,7 @@ Breakpoint 1, internal_getpwuid_r (ent=, errnop=,
0x00007ffff5bbca58 <+328>: movsbq %al,%rdx
0x00007ffff5bbca5c <+332>: testb $0x20,0x1(%rcx,%rdx,2)
[...]
-(gdb) si 2
+(gdb) si 2 # step to <+332>
0x00007ffff5bbca5c 961 while (isspace (*p))
(gdb) info register fs rcx rdx
fs 0x63 99
@@ -366,30 +390,54 @@ Aha! When `%fs = 99`, apparently all is well, but in an iteration which has
`%fs = 0`, all hell breaks loose. Note also that there's a context switch
right before the crash (`[Switching to Thread 0x7ffff7609700 (LWP 136)]`).
-## Hibernation
+[3]: If you're still following along, you'll *really* want to
+use the [Docker image](#fn_1) to avoid a lengthy setup.
+
+## Resolution
This seems to have less and less to do with Go. And indeed, it's only a short
time after that [ianlancetaylor](https://github.com/ianlancetaylor) comes up
with a `C` example which exhibits the same problem. This seems like good news,
-but filing the [upstream issue](https://sourceware.org/bugzilla/show_bug.cgi?id=19341),
+but filing the [upstream issue against glibc](https://sourceware.org/bugzilla/show_bug.cgi?id=19341),
it becomes apparent that `glibc` supports "some static linking" but not all -
in particular, threading is fairly broken and this has been known for a while
and would be quite nontrivial to fix. Roughly what happens is the following:
-* Thread 1 calls out to `libnss_compat` (via `user.Current()`). `libnss` wants
- to use thread-local storage (since the main binary has no dynamic symbol table),
- causing initialization of `ctype` information in the thread-local storage of
- the thread active at load time.
-* Thread 2 runs into `libnss_compat` as well, but the initialization happened
+* Thread 1 calls out to the external shared library `libnss_compat` (via
+ `user.Current()`). `libnss` wants to use thread-local storage (TLS), but it
+ can't use the calling thread's TLS because we're statically linked (so there
+ is no dynamic symbol table).
+ Instead, it uses its own set of TLS variables. But these are initialized at
+ the time at which `libnss` is **loaded** (which is right now), and only on
+ that thread.
+* Thread 2 calls into `libnss_compat` as well, but the initialization happened
only on the first thread. `__ctype_b_loc` relies on this initialization, so
it returns garbage. Boom.
-Summing up a comment by [Carlos O'Donell](https://sourceware.org/bugzilla/show_bug.cgi?id=19341#c1), the bug is likely to live forever and hard to fix; in turn, we're
-[thinking about](https://github.com/cockroachdb/cockroach/pull/3343) linking
-against [musl-libc](http://www.musl-libc.org) instead or - gasp - just doing
-away with static binaries altogether.
-
-Well done, little bug. Well done.
-
-
-[4] [Dockerfile here](https://github.com/tschottdorf/goplay/blob/master/issue_13470/Dockerfile); invoke via `build -t gdb . && docker run -ti gdb`.
+Summing up a comment by [Carlos O'Donell](https://sourceware.org/bugzilla/show_bug.cgi?id=19341#c1),
+the bug is likely to live forever and hard to fix; while you *can* link
+statically against glibc, it's really nothing you should ever find yourself
+doing. At least not if you're using threads.
+
+# Conclusion
+
+Linking statically against `glibc` has proven to be an insane idea, but it's
+surprising that this was apparently news for everyone up to (but not including)
+the glibc bug tracker.
+
+We figured out that we can [get a less obviously ludicrous static build](https://github.com/cockroachdb/cockroach/pull/3343)
+by substituting `glibc` for [musl-libc](http://www.musl-libc.org), but that
+needs careful benchmarking and testing (in particular, we instantly had issues
+with the [DNS resolver](https://github.com/cockroachdb/cockroach/pull/3413)).
+
+At the end of the day, we decided that there were only diminishing returns to
+be had by linking a completely static binary. What really matters to us is not
+having non-standard dependencies - having `glibc` available is a bit of a drag
+when deploying on minimal systems (think containers) but is otherwise
+standard. So, at least for the time being, we'll distributed an image that
+[only links against glibc dynamically](https://github.com/cockroachdb/cockroach/pull/3412).
+
+In a recent post about the [cost and complexity of cgo](http://www.cockroachlabs.com/blog/the-cost-and-complexity-of-cgo/)
+we warned that cgo comes with a more intricate build process and the occasional
+need to take debugging beyond the realms Go. This bug sure goes out of its way
+to prove these points.