Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent failures in pkg test #16555

Closed
tkelman opened this issue May 24, 2016 · 15 comments
Closed

Intermittent failures in pkg test #16555

tkelman opened this issue May 24, 2016 · 15 comments
Assignees
Labels
heisenbug This bug occurs unpredictably help wanted Indicates that a maintainer wants help on an issue or pull request packages Package management and loading test This change adds or pertains to unit tests
Milestone

Comments

@tkelman
Copy link
Contributor

tkelman commented May 24, 2016

This happens often on AppVeyor and I've seen it locally occasionally too, but I also think I've seen it on Travis so it may not be Windows-specific. We get either status code 500, or "Failed to receive response: The server returned an invalid or unrecognized response," etc.

We have made recent changes in libgit2, adding a few new tests and making it throw a few more errors to avoid catching unintended things like typos. Whatever is happening here seems like it may be due to intermittent connectivity problems, or maybe an underlying bug/race condition/undefined behavior in the C library interfaces? Either way we should try to pinpoint the cause and add a mitigation (retries on some operations? specific handling of this expected failure mode?) or fix it if it's a real bug.

What's the best way to debug this? Does anyone else see this locally if you run make test-pkg in a loop? cc @wildart

edit: this looks a lot like #13436 but apparently that had been more common on 32-bit?

@tkelman tkelman added test This change adds or pertains to unit tests packages Package management and loading heisenbug This bug occurs unpredictably labels May 24, 2016
@wildart
Copy link
Member

wildart commented May 25, 2016

It's a connectivity issue. You cannot expect from services 100% uptime. I do not see any solution to that. Maybe we could run tests in the controlled environment (some of our machines).

@tkelman
Copy link
Contributor Author

tkelman commented May 25, 2016

We need to make this more robust, it's actively disruptive when CI is unreliable. Can we make these operations retry several times?

@wildart
Copy link
Member

wildart commented May 25, 2016

I believe this is AppVeyor problem. It could be a configuration issue of IIS on Win x86_64 machines. I found this one: http://stackoverflow.com/questions/20682621/using-iis-and-arr-to-reverse-proxy-returns-the-server-returned-an-invalid-or-un.

@tkelman
Copy link
Contributor Author

tkelman commented May 25, 2016

cc @FeodorFitsner any ideas/suggestions?

@FeodorFitsner
Copy link

This is connectivity issue from where to what - could you please elaborate?

@tkelman
Copy link
Contributor Author

tkelman commented May 25, 2016

Sorry - we're testing our package manager, which is doing git clones, fetches, checkouts etc, via libgit2, from the AppVeyor VM to github repos. Usually JuliaLang/METADATA.jl (which is pretty big) and JuliaLang/Example.jl (which is small). I believe these are WinHTTP errors from the https transport that libgit2 tries to use.

@FeodorFitsner
Copy link

So, those are connectivity issues with github.com?

@tkelman
Copy link
Contributor Author

tkelman commented May 25, 2016

Yes.

@FeodorFitsner
Copy link

OK, can you give a sample of error request/response/status?

@tkelman
Copy link
Contributor Author

tkelman commented May 25, 2016

Usually "Failed to receive response: The server returned an invalid or unrecognized response"

example logs: https://ci.appveyor.com/project/JuliaLang/julia/build/1.0.2056/job/73245nbqo4lq4rf7
https://ci.appveyor.com/project/JuliaLang/julia/build/1.0.2085/job/157d0p16pu7u03tt

There are others further back, most of the other failures have been timeouts due to #16556 which is probably a libuv bug, not an appveyor/github problem.

@FeodorFitsner
Copy link

Well, this might be connectivity issue, of course, but we haven't received any reports from others about any connectivity issues with github.com.

The good news that in coming weeks we are moving our build infrastructure to the same hosting provider used by github.com 😉 Hope that will make things better.

@tkelman
Copy link
Contributor Author

tkelman commented May 25, 2016

Cool. Well I hope that migration goes smoothly and we only notice via fewer failed builds. I'm not sure there's any simple way of debugging this. Maybe trying to create a C reproduction of the same library calls we're using, remoting in and trying to catch it in gdb but that would take a bit of work.

@FeodorFitsner
Copy link

You can do a PowerShell script.

@tkelman
Copy link
Contributor Author

tkelman commented May 13, 2017

I was able to trigger some Pkg/LibGit2 errors locally that looked a lot like this, repeatably, by calling Pkg.clone("IterativeSolvers"). Since this is so annoyingly frequent on AppVeyor, would be good if anyone familiar with the LibGit2 API could recommend things to try to look at.

  | | |_| | | | (_| |  |  Version 0.7.0-DEV.142 (2017-05-13 16:24 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit b1f668dfa9* (0 days old master)
|__/                   |  x86_64-w64-mingw32

julia> Pkg.clone("IterativeSolvers")
INFO: Cloning IterativeSolvers from git://github.com/JuliaMath/IterativeSolvers.jl.git
ERROR: GitError(Code:ERROR, Class:OS, Failed to connect to github.com: )
Stacktrace:
 [1] macro expansion at .\libgit2\error.jl:99 [inlined]
 [2] clone(::SubString{String}, ::String, ::Base.LibGit2.CloneOptions) at .\libgit2\repository.jl:276
 [3] #clone#100(::String, ::Bool, ::Ptr{Void}, ::Nullable{Base.LibGit2.AbstractCredentials}, ::Function, ::SubString{String}, ::String) at .\libgit2\libgit2.jl:569
 [4] clone(::SubString{String}, ::String) at .\pkg\entry.jl:195
 [5] clone(::String) at .\pkg\entry.jl:221
 [6] (::Base.Pkg.Dir.##4#7{Array{Any,1},Base.Pkg.Entry.#clone,Tuple{String}})() at .\pkg\dir.jl:36
 [7] cd(::Base.Pkg.Dir.##4#7{Array{Any,1},Base.Pkg.Entry.#clone,Tuple{String}}, ::String) at .\file.jl:59
 [8] #cd#1(::Array{Any,1}, ::Function, ::Function, ::String, ::Vararg{String,N} where N) at .\pkg\dir.jl:36
 [9] clone(::String) at .\pkg\pkg.jl:169

julia> Pkg.clone("IterativeSolvers")
INFO: Cloning IterativeSolvers from git://github.com/JuliaMath/IterativeSolvers.jl.git
ERROR: GitError(Code:ERROR, Class:Net, Error sending data: Either the application has not called WSAStartup, or WSAStartup failed.
)
Stacktrace:
 [1] macro expansion at .\libgit2\error.jl:99 [inlined]
 [2] clone(::SubString{String}, ::String, ::Base.LibGit2.CloneOptions) at .\libgit2\repository.jl:276
 [3] #clone#100(::String, ::Bool, ::Ptr{Void}, ::Nullable{Base.LibGit2.AbstractCredentials}, ::Function, ::SubString{String}, ::String) at .\libgit2\libgit2.jl:569
 [4] clone(::SubString{String}, ::String) at .\pkg\entry.jl:195
 [5] clone(::String) at .\pkg\entry.jl:221
 [6] (::Base.Pkg.Dir.##4#7{Array{Any,1},Base.Pkg.Entry.#clone,Tuple{String}})() at .\pkg\dir.jl:36
 [7] cd(::Base.Pkg.Dir.##4#7{Array{Any,1},Base.Pkg.Entry.#clone,Tuple{String}}, ::String) at .\file.jl:59
 [8] #cd#1(::Array{Any,1}, ::Function, ::Function, ::String, ::Vararg{String,N} where N) at .\pkg\dir.jl:36
 [9] clone(::String) at .\pkg\pkg.jl:169

@KristofferC
Copy link
Sponsor Member

Open/comment if this still happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
heisenbug This bug occurs unpredictably help wanted Indicates that a maintainer wants help on an issue or pull request packages Package management and loading test This change adds or pertains to unit tests
Projects
None yet
Development

No branches or pull requests

6 participants