-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple testsuite failures on big-endian targets #1144
Comments
That's interesting. Nothing immediately comes to mind as to the root cause. I don't know when I'll have time to look into this. (I do not have immediate access to a big endian machine I can test on, and I don't know when I'll be able to go through the process of requesting a machine.) |
It looks like all of the failures are related to the line number emitted for each match. That's quite baffling as there shouldn't be any endian specific code in that path! |
@BurntSushi do you know who could help with that? |
I don't know. Anyone who has access to a big endian machine and can debug Rust could potentially help. Please note that I have virtually zero insight into when, where or why ripgrep is blocked by something in Debian, so it's very difficult for me to prioritize something like this when perhaps I should have. One thing that might help is if someone wrote out the steps for using one of the big-endian machines to build the same version of ripgrep that this bug was found on using the compiler farm. I've just requested an account. Another option might be to add a big endian target to the CI config and see what happens. e.g., https://github.com/BurntSushi/byteorder/blob/803136cc2719cc4a5baf562877e01938267a6bd9/.travis.yml#L8-L11 |
Oof. I have a hypothesis. I bet the issue is in my rewrite of the |
There is a hypothesis that the memchr rewrite from a while back is getting things wrong on big endian, as shown here: BurntSushi/ripgrep#1144 As a result, we start testing in CI on a big endian architecture.
There is a hypothesis that the memchr rewrite from a while back is getting things wrong on big endian, as shown here: BurntSushi/ripgrep#1144 As a result, we start testing in CI on a big endian architecture.
OK, I can't seem to reproduce this issue. After logging into I'm not sure where to go from here. |
@glaubitz rings a bell? :) |
@sylvestre I'll test on zelenka (s390x). I would be surprised if the bug just went away. |
Ok, |
@glaubitz Interesting. If you can get a list of dependency versions in which the test failures occurred, I can try testing that config. One of the things that happened somewhat recently was a memchr upgrade (from an old version to a rewrite). I wonder if that's the culprit. Unfortunately, this bug is making it difficult for me to make progress. To clarify, every time Cargo tries to update its index, I'm getting this error, e.g.,
|
@BurntSushi Let me apply a workaround. The bug you linked actually goes away when building |
@glaubitz Thanks! |
@BurntSushi Can you try again? |
Hey it works! Thanks! So I just tried pinning Here's a question: do the builds that are failing build with PCRE2 support? I can't seem to compile |
There is a hypothesis that the memchr rewrite from a while back is getting things wrong on big endian, as shown here: BurntSushi/ripgrep#1144 As a result, we start testing in CI on a big endian architecture.
@glaubitz Any updates on how I can reproduce this? |
You could try building the Debian source on a big-endian box. Retrieve the source with:
|
Seems to work fine on gcc202. Test suite runs and passes. The only other thing I can think of is whether y'all are building with PCRE2 support, because PCRE2 doesn't build on the sparc machine, so I can't test it. |
Regular
|
If I run Looking more closely at the sparc64 build log from your original comment, I see this is how tests are run:
which means that PCRE2 isn't being compiled, so I don't think that's the issue. So then I run
And everything passes. Here's the test output: https://gist.github.com/ec8acb1394789a32bb44364e230c0966 Is there something I'm missing about the build environment? To be clear, here's the machine's info:
|
We don't have permission to write to
|
Note that the file does actually exist and is created by the test:
|
The /tmp issue isn't endian specific though. Why does it matter on s390x but not other targets? |
The |
Correction, it doesn't affect the build servers but it does affect the developer porterbox machines, that's why I'm hitting it first. Can you try to reproduce? On Debian we also set |
Looks like it's trying to run "/home/infinity0/rust-ripgrep-0.10.0/target/s390x-unknown-linux-gnu/debug/rg" is actually what exists. |
Looks like changing |
Actually I spoke too soon; running the test executable directly seems to work, but running it via cargo doesn't. Maybe it's screwing with |
OK it looks like |
I can also reproduce it using this git repository's
|
I generally keep issues that aren't actionable closed. I'm happy to reopen, but to be clear, I'm blocked on debugging this myself because I don't have access to a machine that reproduces the problem. |
The problem seems to stem from
etc etc |
git bisect says 968491f is the first bad commit
This updates from bytecount 0.3.2 to 0.5. In Debian we are using bytecount 0.4 and that is also bad. So it looks like the bug is in bytecount 0.4, 0.5 - will keep investigating. |
Yeah, that makes sense. Staring at that a bit more, I have a new hunch: the crate I'm using to count lines might be the culprit here. I'm not familiar with that code, but it's similarish to the kind of code that's in memchr, and that could in theory be endian dependent for $reasons. Hmmm... Running With that said, I still can't explain why the test suite passes on the gcc202 sparc machine, which is big-endian. |
@infinity0 Ah you beat me to it! OK, that looks fairly compelling. I'll figure out how to send a bug report to |
In fact reverting that commit on master (and deleting Cargo.lock to resolve the merge conflict) makes these errors go away. |
OK, and now I have an explanation for why my sparc tests were passing. The I never updated ripgrep to use So it looks like one lesson to be learned here is to find a way to communicate more clearly that Debian might be using different versions of dependencies than what ripgrep's |
Thanks! I can take care of the bug report to bytecount, I can bisect that here too. |
Yes we should be a bit better about communicating that in future. Running tests would have helped to catch this earlier too, however it's a bit hard to do since "the easy way" (in Debian) would introduce cyclic build-dependencies. |
@infinity0 OK, master is updated with the patch, and I've published |
@infinity0 I've also filed #1193 to see about catching these a bit more aggressively. It wouldn't have caught it initially, but my commit to bump bytecount would have failed at least. Thanks so much for debugging this! |
Version 0.5.1 will have the fix. I will ping you once it's published. |
The testsuite of
ripgrep
0.10.0 is failing on all of Debian's big-endian targets:Full build logs here: https://buildd.debian.org/status/package.php?p=rust-ripgrep&suite=sid
Click on the "Build-Attempted" fields in the "Status" column for more information.
For access to a big-endian Linux machine to be able to debug the issue, you can request an account from the gcc compile farm (https://gcc.gnu.org/wiki/CompileFarm) and test on Linux/sparc64, for example.
The text was updated successfully, but these errors were encountered: