Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

temporarily disable LLVM assertions on travis #19803

Closed
wants to merge 1 commit into from

Conversation

tkelman
Copy link
Contributor

@tkelman tkelman commented Jan 1, 2017

Reverts #19678 to temporarily work around #19797 and #19792 until they can be properly solved, either in Julia codegen or with an LLVM patch. repurposed the branch

@tkelman
Copy link
Contributor Author

tkelman commented Jan 1, 2017

According to travis this isn't it then, the reflection test issue with "Cannot define a symbol twice!" caused by #19787 is a different problem and also happens on llvm 3.7.

@tkelman tkelman changed the title Revert "RFC: Change LLVM version to 3.9.1" Revert #19723 and #19787 to avoid code_native LLVM assertion failure Jan 1, 2017
@tkelman
Copy link
Contributor Author

tkelman commented Jan 1, 2017

The failure happens with just code_native(+, Tuple{Array{Float32},Array{Float32}}). A real fix would be much preferred!

@tkelman tkelman added system:32-bit Affects only 32-bit systems system:linux Affects only Linux labels Jan 1, 2017
@pabloferz
Copy link
Contributor

pabloferz commented Jan 1, 2017

Maybe the failure has to do with tuple types, the broadcast mechanism uses them to ask inference the return type.

@tkelman
Copy link
Contributor Author

tkelman commented Jan 1, 2017

in combination with #19746, maybe there's something now invalid about what broadcast was made to do here?

@pabloferz
Copy link
Contributor

Maybe #19667 is also related? What happens if you revert that one and #19723 instead? (I don't have a 32-bit linux machine to test it locally)

@tkelman
Copy link
Contributor Author

tkelman commented Jan 1, 2017

You can build with multilib compilers from 64 bit Linux (that's what is done on Travis), or try the new docker image mentioned here: #18927 (comment)

Reverting #19746 seems to help locally. It needs some tweaks due to nearby deprecations though. I'll try 19667 instead and see if that makes a difference. edit: by itself, reverting 19667 didn't fix this

@pabloferz
Copy link
Contributor

pabloferz commented Jan 1, 2017

Looking more closely at the issues where there issue has been reported, all points out to #19746 as the cause of the problem, not clear to me why though.

@tkelman
Copy link
Contributor Author

tkelman commented Jan 1, 2017

#19746 made many more operations go through broadcast instead of loops. So probably if the linalg or reflection tests were written to manually go through broadcast the problem would be visible earlier? Suggests either a codegen issue or that the broadcast code is doing something it's not supposed to.

edit: code_native(broadcast, Tuple{typeof(+),Array{Float32},Array{Float32}}) starts failing at f147aaa

@pabloferz
Copy link
Contributor

code_native(broadcast, Tuple{typeof(+),Array{Float32},Array{Float32}}) starts failing at f147aaa

Oddly enough that PR didn't change any of the underlying broadcast mechanism, just removed some Nullable methods that got nothing to do with, e.g., broadcast(+, [1.0f0], [1.0f0])

@andreasnoack
Copy link
Member

#19746 made many more operations go through broadcast instead of loops.

I think this is key. The problem must be (caused by the compilation of) broadcast and #19746 just makes the problem visible. More specifically, with #19746 AbstractArray≈AbstractArray calls broadcast which it didn't use to.

@JeffBezanson
Copy link
Sponsor Member

If we revert this, we also have to revert #16961. A compiler bug does not mean #16961 gets to be the default. Hopefully the underlying cause can be fixed instead.

@tkelman
Copy link
Contributor Author

tkelman commented Jan 1, 2017

I may have a less-annoying temporary bandaid: 27166ae

Seems to work okay locally without assertions enabled, but presumably that assertion is there for a reason and something is wrong here.

@tkelman tkelman force-pushed the revert-19678-tk/llvm39 branch 3 times, most recently from 088813c to 965e9ba Compare January 2, 2017 02:33
@tkelman tkelman changed the title Revert #19723 and #19787 to avoid code_native LLVM assertion failure temporarily disable LLVM assertions on travis Jan 2, 2017
@tkelman
Copy link
Contributor Author

tkelman commented Jan 2, 2017

@Keno (or anyone else) you should be able to reproduce this with docker run -it tkelman/julia32-part1:manual then make test-reflection inside the container.

@Keno
Copy link
Member

Keno commented Jan 2, 2017

@tkelman that docker image has unusually high uids (30000) in a few places, which doesn't play nice with my setup. If it's not too much trouble, could you chown -R root:root /src and re-upload?

@tkelman
Copy link
Contributor Author

tkelman commented Jan 2, 2017

okay, pushing as tkelman/julia32-part1:manual2

@Keno
Copy link
Member

Keno commented Jan 2, 2017

Ah, that didn't quite work, since it's a problem when extracting the layers, and that just added another layer. No worries, I'll adjust my setup. Just something to keep in mind for the future.

@tkelman
Copy link
Contributor Author

tkelman commented Jan 3, 2017

any update?

@Keno
Copy link
Member

Keno commented Jan 3, 2017

Working on it. With the docker container, I did see this, so I have it locally for debugging.

@tkelman tkelman deleted the revert-19678-tk/llvm39 branch January 6, 2017 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:linux Affects only Linux system:32-bit Affects only 32-bit systems
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants