Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.10 backports: Revert #51403 #51612

Closed

Conversation

NHDaly
Copy link
Member

@NHDaly NHDaly commented Oct 6, 2023

Revert "inference: follow up the Vararg fix in abstract_call_unionall (#51403).

This reverts commit ae8f9ad.

Fixes #51603.

CC: @aviatesk.
We probably also want to revert this on master, since we're experiencing the same issue on master.

@NHDaly
Copy link
Member Author

NHDaly commented Oct 6, 2023

I'm not sure if it makes sense to revert #51403 without also reverting #51393. I think probably @aviatesk will have to weigh in there? Or maybe @vtjnash who reviewed these.

(Noting that reverting both also fixes #51603 for me.)

@aviatesk
Copy link
Member

aviatesk commented Oct 6, 2023

I'll look on the hang promptly. Instead of reverting those PRs, I'd prefer to find a proper fix since #51393 was intended to address another 1.10 milestone issue (#51310). I hope that the issue might be stemming from a minor oversight in the follow-up PR.

@aviatesk
Copy link
Member

aviatesk commented Oct 6, 2023

Does this patch work for you?

diff --git a/base/compiler/abstractinterpretation.jl b/base/compiler/abstractinterpretation.jl
index bfddc2d692..7b171cff15 100644
--- a/base/compiler/abstractinterpretation.jl
+++ b/base/compiler/abstractinterpretation.jl
@@ -1875,23 +1875,22 @@ function abstract_call_unionall(interp::AbstractInterpreter, argtypes::Vector{An
         end
         a2 = argtypes[2]
         a3 = unwrapva(argtypes[3])
-        nothrow = false
+        canconst = nothrow = false
     elseif na == 3
         a2 = argtypes[2]
         a3 = argtypes[3]
         ⊑ᵢ = ⊑(typeinf_lattice(interp))
-        nothrow = a2 ⊑ᵢ TypeVar && (a3 ⊑ᵢ Type || a3 ⊑ᵢ TypeVar)
+        canconst = nothrow = a2 ⊑ᵢ TypeVar && (a3 ⊑ᵢ Type || a3 ⊑ᵢ TypeVar)
     else
         return CallMeta(Bottom, EFFECTS_THROWS, NoCallInfo())
     end
-    canconst = true
     if isa(a3, Const)
         body = a3.val
     elseif isType(a3)
         body = a3.parameters[1]
         canconst = false
     else
-        return CallMeta(Any, Effects(EFFECTS_TOTAL; nothrow), call.info)
+        return CallMeta(Any, EFFECTS_THROWS, call.info)
     end
     if !(isa(body, Type) || isa(body, TypeVar))
         return CallMeta(Any, EFFECTS_THROWS, call.info)

I'm willing to test it myself if there are any reproducer available.

nickrobinson251 added a commit to RelationalAI/julia that referenced this pull request Oct 6, 2023
nickrobinson251 added a commit to RelationalAI/julia that referenced this pull request Oct 6, 2023
@nickrobinson251
Copy link
Contributor

thanks! in v1.10 looks like we need NoCallInfo() not call.info. testing this out now

diff --git a/base/compiler/abstractinterpretation.jl b/base/compiler/abstractinterpretation.jl
index 11d81dc3d0..573fbd6e43 100644
--- a/base/compiler/abstractinterpretation.jl
+++ b/base/compiler/abstractinterpretation.jl
@@ -1878,12 +1878,12 @@ function abstract_call_unionall(interp::AbstractInterpreter, argtypes::Vector{An
         end
         a2 = argtypes[2]
         a3 = unwrapva(argtypes[3])
-        nothrow = false
+        canconst = nothrow = false
     elseif na == 3
         a2 = argtypes[2]
         a3 = argtypes[3]
         ⊑ᵢ = ⊑(typeinf_lattice(interp))
-        nothrow = a2 ⊑ᵢ TypeVar && (a3 ⊑ᵢ Type || a3 ⊑ᵢ TypeVar)
+        canconst = nothrow = a2 ⊑ᵢ TypeVar && (a3 ⊑ᵢ Type || a3 ⊑ᵢ TypeVar)
     else
         return CallMeta(Bottom, EFFECTS_THROWS, NoCallInfo())
     end
@@ -1894,7 +1894,7 @@ function abstract_call_unionall(interp::AbstractInterpreter, argtypes::Vector{An
         body = a3.parameters[1]
         canconst = false
     else
-        return CallMeta(Any, Effects(EFFECTS_TOTAL; nothrow), NoCallInfo())
+        return CallMeta(Any, EFFECTS_THROWS, NoCallInfo())
     end
     if !(isa(body, Type) || isa(body, TypeVar))
         return CallMeta(Any, EFFECTS_THROWS, NoCallInfo())

@brenhinkeller brenhinkeller added the bugfix This change fixes an existing bug label Oct 6, 2023
@NHDaly
Copy link
Member Author

NHDaly commented Oct 6, 2023

I'll look on the hang promptly. Instead of reverting those PRs, I'd prefer to find a proper fix since #51393 was intended to address another 1.10 milestone issue (#51310). I hope that the issue might be stemming from a minor oversight in the follow-up PR.

Thanks for the fix, @aviatesk! 👍 👍 And thanks for testing it out, @nickrobinson251. We'll keep you posted.

I don't think we have a small reproducer yet, but I'll see if I can distill something today. It would be great to merge the fix together with a test case.

@NHDaly
Copy link
Member Author

NHDaly commented Oct 6, 2023

😢 @aviatesk: It looks like Nick's build with your patch didn't help. It's still stuck here:

- PackageCompiler: compiling incremental system image

I'll look on the hang promptly. Instead of reverting those PRs, I'd prefer to find a proper fix since #51393 was intended to address another 1.10 milestone issue (#51310). I hope that the issue might be stemming from a minor oversight in the follow-up PR.

Thanks for letting us know. :/ that's too bad.
In retrospect, i wonder whether for #51310 we should have just reverted #48063 to clear the issues from the 1.10 branch, and then we would have more breathing room to get this fixed on master. It's probably too late for that now though. Would it be an option to revert all 3 PRs? Or you think too much is relying on that in this release by now?

Let us know what else we can do to try to fix this forward. Right now, our 1.10 build is broken.

@NHDaly
Copy link
Member Author

NHDaly commented Oct 6, 2023

In case this helps, I got this stacktrace from a stuck task, here:

obvious_subtype at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/libjulia-internal.1.10.0.dylib (unknown line)
ijl_subtype_env at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/libjulia-internal.1.10.0.dylib (unknown line)
uv_alloc_buf at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/sys.dylib (unknown line)
jfptr_uv_alloc_buf_75149 at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/sys.dylib (unknown line)
ijl_apply_generic at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/libjulia-internal.1.10.0.dylib (unknown line)
jlcapi_uv_alloc_buf_75686 at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/sys.dylib (unknown line)
uv__read at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/libjulia-internal.1.10.0.dylib (unknown line)
uv__stream_io at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/libjulia-internal.1.10.0.dylib (unknown line)
uv__io_poll at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/libjulia-internal.1.10.0.dylib (unknown line)
uv_run at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/libjulia-internal.1.10.0.dylib (unknown line)
ijl_task_get_next at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/libjulia-internal.1.10.0.dylib (unknown line)
poptask at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/sys.dylib (unknown line)
wait at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/sys.dylib (unknown line)
YY.waitYY.645 at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/sys.dylib (unknown line)
_trywait at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/sys.dylib (unknown line)
YY.726 at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/sys.dylib (unknown line)
jfptr_YY.726_75384 at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/sys.dylib (unknown line)
ijl_apply_generic at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/libjulia-internal.1.10.0.dylib (unknown line)
start_task at /nix/store/p2ycjgmsjwjbl3k352z63dl8i60kq58s-julia-1.10/lib/julia/libjulia-internal.1.10.0.dylib (unknown line)
unknown function (ip: 0x0)

@NHDaly
Copy link
Member Author

NHDaly commented Oct 6, 2023

@aviatesk: I still think that we should merge this revert PR.

In my experience the best approach to take for a release branch is that when there is a test-blocking bug introduced on the release branch, the first thing to do is to revert it, so that the rest of your tests can still run. Then when you have a fix, you can replay the reverted PR together with the fix, rather than leaving the tests broken during that whole period.

In this case, unfortunately julia's public test suite doesn't surface the failure, but our private RAI test suite does, and as part of our recently started oncall rotation to watch that test suite on the release branches, we're essentially donating that test suite to julia, so I think it's beneficial to keep it green as well. Can we start by merging this revert, and then we can always fix-forward once we find the solution?

If that's not what you prefer, I think we can just revert it for ourselves, but it will mean that what we're testing for future backport commits isn't reflective of what is on the branch, so we might be testing the wrong things.

@aviatesk
Copy link
Member

aviatesk commented Oct 7, 2023

Would it be an option to revert all 3 PRs? Or do you think too much is relying on them in this release by now?

That's a tough call. We'd avoid rolling back all 3 PRs, considering some package tests might now be relying on the enhancements brought about by the initial PR. Reverting it could inadvertently introduce another regression (so requiring more rounds of pkgeval).
At the same time I completely understand your situation, and I'm committed to resolving this issue as quickly as possible.

Here's a snapshot of the current situation:

So #51403 wasn't merely an enhancement but a correctness fix. Just reverting it feels like trading one bug for another. E.g. without #51403, we'd likely face a segfault when executing the test case added in the PR

abstract_call_unionall_vararg(some::Some{Any}) = UnionAll(some.value...)
@test only(Base.return_types(abstract_call_unionall_vararg)) !== Union{}
let TV = TypeVar(:T)
t = Vector{TV}
some = Some{Any}((TV, t))
@test abstract_call_unionall_vararg(some) isa UnionAll
end
.

Instead of a full revert, I'd like to propose the following patch.

diff --git a/base/compiler/abstractinterpretation.jl b/base/compiler/abstractinterpretation.jl
index bfddc2d692..e416e42e6d 100644
--- a/base/compiler/abstractinterpretation.jl
+++ b/base/compiler/abstractinterpretation.jl
@@ -1868,14 +1868,16 @@ end
 function abstract_call_unionall(interp::AbstractInterpreter, argtypes::Vector{Any}, call::CallMeta)
     na = length(argtypes)
     if isvarargtype(argtypes[end])
-        if na ≤ 2
-            return CallMeta(Any, EFFECTS_THROWS, call.info)
-        elseif na > 4
-            return CallMeta(Bottom, EFFECTS_THROWS, NoCallInfo())
-        end
-        a2 = argtypes[2]
-        a3 = unwrapva(argtypes[3])
-        nothrow = false
+        return CallMeta(Any, EFFECTS_THROWS, NoCallInfo())
+        # TODO revive this piece of code once we get a reproducer for #51603
+        # if na ≤ 2
+        #     return CallMeta(Any, EFFECTS_THROWS, call.info)
+        # elseif na > 4
+        #     return CallMeta(Bottom, EFFECTS_THROWS, NoCallInfo())
+        # end
+        # a2 = argtypes[2]
+        # a3 = unwrapva(argtypes[3])
+        # nothrow = false
     elseif na == 3
         a2 = argtypes[2]
         a3 = argtypes[3]

While it worsens the inference quality, it should still keep the compiler correctness. I'm hopeful this patch might fix your issue, although I can't guarantee it 'cause we don't have MRE yet. Given that optimal inference can't be achieved when we see Vararg-type there from the beginning, I anticipate this change won't result in a significant regression. So we can have it around if it fixes the hang you're seeing.

@nickrobinson251
Copy link
Contributor

that patch also didn't work 😞 still stuck at

- PackageCompiler: compiling incremental system image

i'll try to create a MRE

@aviatesk
Copy link
Member

that patch also didn't work

Hmm... That was unexpected, sorry. AN MRE would be really helpful.

@aviatesk
Copy link
Member

Could you possibly provide us rr debbuger session for the hang? That might help us diagnose the problem.

@nickrobinson251
Copy link
Contributor

yeah, we'll work on getting an rr trace (#51603 (comment))

@NHDaly
Copy link
Member Author

NHDaly commented Oct 13, 2023

I'm closing this for now, since we are no longer sure if this is related to the hang or not. It might not have been, since apparently the hang is intermittent. :(

@NHDaly NHDaly closed this Oct 13, 2023
@NHDaly NHDaly deleted the nhd-backports-1.10-revert-51403 branch October 13, 2023 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants