Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in nightly and 1.7.0-rc3 #43123

Closed
goerz opened this issue Nov 18, 2021 · 8 comments · Fixed by #43163
Closed

Segfault in nightly and 1.7.0-rc3 #43123

goerz opened this issue Nov 18, 2021 · 8 comments · Fixed by #43163
Labels
bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code priority This should be addressed urgently
Milestone

Comments

@goerz
Copy link
Contributor

goerz commented Nov 18, 2021

Around Oct 10, 2021 (±2 days), I started seeing segfaults in the "Nightly" CI runs for some of my packages.

The last run that worked without error was this one on Oct 7: https://github.com/JuliaQuantumControl/QuantumControlBase.jl/actions/runs/1318143883

The first one that segfaulted was https://github.com/JuliaQuantumControl/QuantumControlBase.jl/runs/3854752966?check_suite_focus=true on Oct 10. This is actually a failure in an "upstream" package (Krotov), I just realized (although I suspect the underlying cause might be connected to what I'm reporting below).

On Oct 12, I added some code to the QuantumControlBase repo that's been consistently segfaulting on Nightly, but running fine on the normal CI, most recently at https://github.com/JuliaQuantumControl/QuantumControlBase.jl/runs/4129399244?check_suite_focus=true

I didn't realize that the problematic code in QuantumControlBase was only added on Oct 12 when I tried to reproduce this locally just now; sorry if that makes this report a bit more confusing than it needs to be. I'll have a look at what's going in in Krotov as well, if I can.

In any case, I've been able to reproduce the segfault in QuantumControlBase locally on macOS using today's Nightly version of Julia, and I managed to pare it down to the following minimal example (no dependencies):

eps1 = t->1.0
eps2 = t->2.0
generator = (nothing, (nothing, eps1), (nothing, eps2))
# generator = (nothing, (nothing, eps1))  # works
# generator = (nothing, (nothing, eps1), (nothing, eps1)) # works


function broken(generator::Tuple, control)
    control_generator :: Any = nothing
    for part in generator
        if isa(part, Tuple)
            if part[2] === control
                control_generator = part[1]
            end
        end
    end
    return control_generator
end

broken(generator, eps1)

This is derived from QuantumControlBase.getcontrolderiv.

If I put the above code in a file minimal.jl, and include it in a fresh Nightly-REPL, I'm seeing the segfault. I've not been able to go any deeper than that; there are a few variations commented out above that run through.

@goerz
Copy link
Contributor Author

goerz commented Nov 18, 2021

I'm also getting the same segfault with v1.7.0-rc3, "macOS x86 (Intel or Rosetta)"

@goerz
Copy link
Contributor Author

goerz commented Nov 18, 2021

And also with v1.7.0-rc3, "macOS ARM (M-series Processor)"

@goerz goerz changed the title Segfault in nightly Segfault in nightly and 1.7.0-rc3 Nov 18, 2021
@Keno Keno added bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code labels Nov 18, 2021
@Keno Keno added this to the 1.7 milestone Nov 18, 2021
@Keno
Copy link
Member

Keno commented Nov 18, 2021

Crash is in codegen:

signal (11): Segmentation fault
in expression starting at REPL[5]:1
maybe_bitcast at /home/keno/julia/src/cgutils.cpp:436 [inlined]
operator() at /home/keno/julia/src/codegen.cpp:2418
emit_guarded_test<emit_box_compare(jl_codectx_t&, const jl_cgval_t&, const jl_cgval_t&, llvm::Value*, llvm::Value*)::<lambda()>&> at /home/keno/julia/src/cgutils.cpp:1086 [inlined]
emit_guarded_test<emit_box_compare(jl_codectx_t&, const jl_cgval_t&, const jl_cgval_t&, llvm::Value*, llvm::Value*)::<lambda()>&> at /home/keno/julia/src/cgutils.cpp:1108 [inlined]
emit_nullcheck_guard<emit_box_compare(jl_codectx_t&, const jl_cgval_t&, const jl_cgval_t&, llvm::Value*, llvm::Value*)::<lambda()>&> at /home/keno/julia/src/cgutils.cpp:1116
emit_nullcheck_guard2<emit_box_compare(jl_codectx_t&, const jl_cgval_t&, const jl_cgval_t&, llvm::Value*, llvm::Value*)::<lambda()> > at /home/keno/julia/src/cgutils.cpp:1124 [inlined]
emit_box_compare at /home/keno/julia/src/codegen.cpp:2417
emit_f_is at /home/keno/julia/src/codegen.cpp:2686
emit_builtin_call at /home/keno/julia/src/codegen.cpp:2800

@inkydragon
Copy link
Sponsor Member

inkydragon commented Nov 18, 2021

Reproduce on Win 10 + Cygwin x86_64 + master branch 455236e + debug mode,
REPL + gdb backtrace gist:

https://gist.github.com/inkydragon/70e8afacbe1c4387115a93ce6b477223


Simplified test code:

  • 1.6.3: fine
  • 1.7.0-rc3: throw
  • master: throw
fn1 = t->0.0;
fn2 = t->0.0;
generator = (
    fn1,
    fn2
);

function broken2(gen, control)
    for f in gen
        if f === control
            nothing
        end
    end
end

[ f===fn1 for f in generator]   # works
filter(x -> x===fn1, generator) # works

broken2(generator, fn1)  # throw

repl+Simplified_test_code backtrace

@KristofferC KristofferC added the priority This should be addressed urgently label Nov 18, 2021
@staticfloat
Copy link
Sponsor Member

Bisected to 8739df2. Tagging @vtjnash to take a look whenever he can.

@goerz
Copy link
Contributor Author

goerz commented Nov 22, 2021

Can you tag me when the patch lands in Nightly, so I can try running my tests again?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 22, 2021

It already did (https://build.julialang.org/#/builders/69/builds/6416/steps/8/logs/stdio)

@goerz
Copy link
Contributor Author

goerz commented Nov 27, 2021

I can confirm that I'm no longer seeing the segfault with the latest Nightly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code priority This should be addressed urgently
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants