-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hang in Fetch after Clone while converting to mirror #356
Comments
Ah. If I let it run long enough, I eventually get I see a slew of runtime.LockOSThread calls throughout libgit2, but each only lasts as long as an individual git2go function/method call. That means that the Go runtime is free to switch from thread to thread between calls, which is indeed very likely to happen for cgo calls. Looking at the dtruss output (https://gist.github.com/josharian/c8b990eec873f245e30b6f29bfcc9245), it appears that in fact different threads are in use. I wonder whether a different thread-locking strategy is required. An example alternative is for a Repository to spin up a goroutine that does nothing but read from a Or maybe I have mis-diagnosed. In any case, this is a blocker for me to be able to use git2go. Please let me know what I can do to help. |
More data. I've instrumented libgit2 (now using next / tip) and spied using Wireshark. The clone succeeds. The fetch begins, and transfers a bunch of data. Then, for no reason I can see, the TCP traffic ceases and QUIC traffic begins. At this point, libgit2 is blocked in a select statement in wait_for, called from curls_read (all in curl_stream.c). Then, about 4 minutes later, the server sends a TCP FIN ACK. The select completes and curls_read returns an error. None of this explains why the C code succeeds, but this no longer really looks to me like a threading bug. |
I've been unable to reproduce this locally, but FWIW you don't need the second fetch, we have a callback to override the remote creation precisely for this use-case. The library will call your code instead of its own if you have this override and then you can do whatever during the initial remote creation, like git's own I've reworked the code from the issue description to a test that does it in a single step to test things out. While it doesn't fix the issue per se, it would avoid triggering it and it avoids the no-transfer network connection. func createMirrorRemote(repo *Repository, defaultName, url string) (*Remote, ErrorCode) {
remote, err := repo.Remotes.CreateWithFetchspec(defaultName, url, "+refs/*:refs/*")
if err != nil {
return nil, ErrGeneric
}
cfg, err := repo.Config()
if err != nil {
return nil, ErrGeneric
}
err = cfg.SetBool(fmt.Sprintf("remote.%s.mirror", defaultName), true)
if err != nil {
return nil, ErrGeneric
}
return remote, ErrOk
}
func TestGoogleClone(t *testing.T) {
t.Parallel()
url := "https://go.googlesource.com/review"
dest, err := ioutil.TempDir("/Users/carlos/tmp", "review")
if err != nil {
t.Fatal(err)
}
t.Log("starting clone of", url, "into", dest, "\n")
repo, err := Clone(url, dest, &CloneOptions{
Bare: true,
RemoteCreateCallback: createMirrorRemote,
})
if err != nil {
t.Fatal(err)
}
t.Log("finished mirror-clone of", url, "into", dest, repo.Path())
} |
As it happens, running a test against go tip https://travis-ci.org/libgit2/git2go/jobs/186415531 (which will become 1.8) errors out in fetch, so it does seem likely we're messing up somehow, though I still don't know exactly where. |
Coming back to this after a while... I can sometimes reproduce this hang with just the tests we have in the codebase. Sometimes it outright crashes. Via printf-debugging I see in the crashes that the finalizer is running for the It looks like the much more aggressive garbage collector in 1.8 does not consider the receiver to be keeping it alive. It does look exactly like the docs for |
I'm on vacation for the next few weeks, but that seems plausible. |
I'm going to consider this closed as of #393, do reopen if it's still broken. |
The following program hangs most times I run it (but not always):
It hangs during the call to remote.Fetch. Sample run, killed manually by SIGQUIT when it hung:
Reproduced with both Go 1.7.4 and Go 1.8 beta 1.
Doing some horrible manual symbol resolution, I managed to tease out this backtrace:
Frames 14 to 22 are all in Go world and are uninteresting. I believe that the pcs in the 0x00007fff... range correspond to assertion failures.
libgit2 itself does not appear to be the problem. The following equivalent C program always works successfully.
This smells like memory corruption due to a race somewhere, but I don't see where/how. Running under the Go race detector yields no complaints. I'm running darwin, not linux, so I can't test with
-msan
.The text was updated successfully, but these errors were encountered: