Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linux-x86: Most git operations fail with "failed to stat <some file>" #3844

Open
senekor opened this issue Jun 7, 2024 · 37 comments
Open

linux-x86: Most git operations fail with "failed to stat <some file>" #3844

senekor opened this issue Jun 7, 2024 · 37 comments

Comments

@senekor
Copy link
Contributor

senekor commented Jun 7, 2024

Update/TLDR

It looks like the issue is that cargo binstall jj-cli installs a broken binary from QuickInstall for jj 0.18 on linux x86 systems.

Fix:

The problem is solved on recent versions of cargo-binstall. Make sure you're running the latest one before reinstalling jj to solve the problem:

cargo binstall cargo-binstall
cargo binstall --force jj-cli

We'd appreciate a confirmation that the fix works for people.

Alternatively, or if you want to be doubly sure, you can run

cargo binstall --strategies crate-meta-data --force jj-cli

Description

I just updated to 0.18 and most jj commands fail (some are fine, e.g. log) with similar error messages:

$ jj new x
Error: Git operation failed
Caused by: failed to stat '/home/senk/.gitconfig'; class=Config (7)

(I do have a git config, it just lives in ~/.config/git/config.)
If I touch that file, the process repeats with ./.git/info/grafts and ./.git/shallow.
Finally, git operations of jj seemingly work again (at least jj new).
However, using the git cli itself I get a warning about .git/info/grafts being deprecated.
So that doesn't seem like a good solution to just make these dummy files.

Steps to Reproduce the Problem

This is the second computer I updated and the first didn't show this problem.
So I don't know how to reproduce it yet.

Expected Behavior

Git operations should succeed as normal.

Actual Behavior

$ jj new x
Error: Git operation failed
Caused by: failed to stat '/home/senk/.gitconfig'; class=Config (7)

Specifications

  • Platform: Linux
  • Version: 6.8.11-300.fc40.x86_64
@senekor
Copy link
Contributor Author

senekor commented Jun 7, 2024

This happens in all repositories, even when I do a fresh jj git init --colocate in a previously only git repo.

@senekor
Copy link
Contributor Author

senekor commented Jun 7, 2024

It does NOT happen in a repo that isn't colocated. Only noticing it now because I default to colocated repos.

@mgattozzi
Copy link
Contributor

mgattozzi commented Jun 7, 2024

Confirming I'm seeing this in my collocated repos as well with the upgrade to 0.18.0

EDIT: I can't even fresh clone a repo actually, it's not even a matter of colocation.
EDIT2: Even if I add an empty grafts and shallow file other operations like git fetch fail with more "failed to stat"

❯ jj git fetch
Error: failed to stat '/home/michael/dotfiles/.jj/repo/store/git/objects/pack/pack_git2_4978ca90985didx': Resource temporarily unavailable; class=Os (2)

@ilyagr
Copy link
Collaborator

ilyagr commented Jun 7, 2024

I tired to reproduce this briefly on a Mac, but haven't been able to yet. I can also try Linux later. So, I'm totally guessing in the dark.

It might be worth double-checking that it's not some sort of a temporary problem with the /home filesystem. Do the files it's complaining about exist? Would rebooting the computer make a difference? Do plain git commands work in that repo?

It might also be some sort of incompatibility between a libgit2 and/or gix version and/or some git version. Which version of Git do you use?

@martinvonz
Copy link
Owner

It would also be very helpful if one if you can build jj from source and bisect the v0.17.0..v0.18.0 range.

@yuja
Copy link
Collaborator

yuja commented Jun 8, 2024

❯ jj git fetch
Error: failed to stat '/home/michael/dotfiles/.jj/repo/store/git/objects/pack/pack_git2_4978ca90985didx': Resource temporarily unavailable; class=Os (2)

Seems like underlying libgit2 code paths are totally broken. It's weird that fstat returned EAGAIN. Running with strace might show something wrong.

Btw, which Linux distribution and version are you using?

@senekor
Copy link
Contributor Author

senekor commented Jun 8, 2024

I'm running Fedora 40. The thing is that my personal computer doesn't have this issue, only my work computer. (Same setup for all intents and purposes. Obv. not the exact same hardware.)

I'll find some time later today to go to the office and do the bisect.

@senekor
Copy link
Contributor Author

senekor commented Jun 8, 2024

Well, installing from source works perfectly fine 😕 I'll try to reproduce the CI build more closely, starting with using musl instead of glibc.

@senekor
Copy link
Contributor Author

senekor commented Jun 8, 2024

This is quite strange. I built the v0.18.0 tag in an Ubuntu container with the musl target... and it works fine. If I cargo-binstall it, the problem is back. At this point, I don't know what to try anymore to reproduce it. It seems almost like something went wrong with the build, rather than the source.

@ilyagr
Copy link
Collaborator

ilyagr commented Jun 8, 2024

Have you been using cargo install --locked or cargo build to build it, as opposed to a plain cargo install?

If so, another remote possibility (reaching at straws here) is that perhaps the binary uses some CPU instruction that your processor doesn't support (got optimized for Intel-only instructions or AMD-only or ??).

@ilyagr
Copy link
Collaborator

ilyagr commented Jun 8, 2024

Also, the release binaries are built with Rust 1.76. I'd be surprised if this made a difference...

@senekor
Copy link
Contributor Author

senekor commented Jun 8, 2024

I used cargo build. I don't really know about the nuances of what those different commands do. Let me know if I should try something else.

My home pc has an amd cpu, the office one is intel (11900k I think?).

I'm using the latest rust version (1.78).

@ilyagr
Copy link
Collaborator

ilyagr commented Jun 8, 2024

cargo build should use Cargo.lock and therefore the same dependencies as the release build.

If you installed Rust with rustup, you could relatively easily install 1.76 toolchain (rustup toolchain add 1.76, I believe, and then cargo +1.76 build).

I guess it's possible that the binary doesn't work correctly on Intel CPUs. We can ask on Discord, that's probably the easiest way to find out.

@senekor
Copy link
Contributor Author

senekor commented Jun 8, 2024

Shoot, looks like I made a mistake. Apparantly I had a self-compiled version of jj installed on my personal machine all along. Now that I installed the precompiled binary here as well, I can at least reproduce at home. So Intel / AMD cpu doesn't seem to be the issue.

I'll try to compile myself with Rust 1.76 in a minute.

@senekor
Copy link
Contributor Author

senekor commented Jun 8, 2024

Compiling with 1.76 also doesn't cause any issues, jj works fine.

I had another idea that it might be the feature flags. I'm pretty sure I had them set just like in the CI workflow when testing early at the office. But I'll try again. oppenssl always gives me trouble with Rust builds 😢

@senekor
Copy link
Contributor Author

senekor commented Jun 8, 2024

Nope, the features don't make a difference either. Still at square one: Local builds are fine, prebuilt ones are borked.

@senekor
Copy link
Contributor Author

senekor commented Jun 8, 2024

What the... if I download the prebuilt binary manually, it seems to be fine as well. Only cargo binstall causes issues! Now I have to figure out how and why I'm getting a different thing from binstall.

@senekor
Copy link
Contributor Author

senekor commented Jun 8, 2024

With the following command, the correct binary seems to be installed:

cargo binstall jj-cli --target x86_64-unknown-linux-musl
# ...
WARN The package jj-cli v0.18.0 (x86_64-unknown-linux-musl) has been downloaded from github.com
# ...

Notice the snippet of the output (github.com). This is different without the explicit target:


cargo binstall jj-cli
# ...
WARN The package jj-cli v0.18.0 (x86_64-unknown-linux-gnu) has been downloaded from third-party source QuickInstall
# ...

So, QuickInstall. It looks like they also provide prebuilt releases for some rust programs. Presumably, the borked binaries are downloaded from this release:

https://github.com/cargo-bins/cargo-quickinstall/releases/tag/jj-cli-0.18.0

@senekor
Copy link
Contributor Author

senekor commented Jun 8, 2024

I have no idea how the QuickInstall system works. But it seems they only provide binaries for x86_64-unknown-linux-gnu (the one we don't provide).

Would there be any downside to providing gnu and musl binaries for linux? It would probably be even easier than adding aarch64 binaries. It's a little redundant since musl works fine everywhere, but it's an easy solution.

I'll investigate the QuickInstall system a bit to maybe figure out what went wrong there.

@ilyagr
Copy link
Collaborator

ilyagr commented Jun 8, 2024

I tried "pinning" this issue and putting the workaround in the first comment.

@ilyagr ilyagr changed the title Most git operations fail with "failed to stat <some file>" linux-x86: Most git operations fail with "failed to stat <some file>" Jun 8, 2024
@senekor
Copy link
Contributor Author

senekor commented Jun 8, 2024

Great! The flag --strategies crate-meta-data works for me.

@NobodyXu
Copy link

NobodyXu commented Jun 9, 2024

The quick-install version is built using glibc 2.17 to maintain backwards compatibility with centos 7, perhaps that's the reason why it failed?

@NobodyXu
Copy link

NobodyXu commented Jun 9, 2024

I could:

  • set glibc version higher for jj on quick-install
  • enable gix on quick-install

to workaround this issue

@ilyagr
Copy link
Collaborator

ilyagr commented Jun 9, 2024

@NobodyXu good questions, thank you for looking into it! Unfortunately, I don't know about glibc.

Re enabling gix, this will not get rid of the libgit2 dependency. Pushing and pulling is only implemented with libgit2 in jj currently.

@NobodyXu
Copy link

NobodyXu commented Jun 9, 2024

Pushing and pulling is only implemented with libgit2 in jj currently.

Not sure about pushing, but pulling using gix is definitely doable.

@yuja
Copy link
Collaborator

yuja commented Jun 9, 2024

The binary downloaded from https://github.com/cargo-bins/cargo-quickinstall/releases/tag/jj-cli-0.18.0 appears to link some of the libc functions statically.

For example, git_config_add_file_ondisk calls statically-linked stat64->fstatat64. OTOH, it refers to dynamically-linked errno.

(gdb) disas git_config_add_file_ondisk
Dump of assembler code for function git_config_add_file_ondisk:
   ...
   0x0000000000ee4aee <+46>:	mov    rbp,rdi
   0x0000000000ee4af1 <+49>:	lea    rsi,[rsp+0x10]
   0x0000000000ee4af6 <+54>:	mov    rdi,rbx
   0x0000000000ee4af9 <+57>:	call   0x1274b70 <stat64>
   0x0000000000ee4afe <+62>:	test   eax,eax
   0x0000000000ee4b00 <+64>:	js     0xee4b8e <git_config_add_file_ondisk+206>
   ...
   0x0000000000ee4b8e <+206>:	call   0x1274ca0 <__errno_location@plt>
   0x0000000000ee4b93 <+211>:	mov    eax,DWORD PTR [rax]
   0x0000000000ee4b95 <+213>:	cmp    eax,0x2
   0x0000000000ee4b98 <+216>:	je     0xee4b06 <git_config_add_file_ondisk+70>
   0x0000000000ee4b9e <+222>:	cmp    eax,0x14
   0x0000000000ee4ba1 <+225>:	je     0xee4b06 <git_config_add_file_ondisk+70>
   0x0000000000ee4ba7 <+231>:	lea    rsi,[rip+0xffffffffff399803]        # 0x27e3b1
   0x0000000000ee4bae <+238>:	mov    edi,0x7
   0x0000000000ee4bb3 <+243>:	mov    rdx,rbx
   0x0000000000ee4bb6 <+246>:	xor    eax,eax
   0x0000000000ee4bb8 <+248>:	call   0xeed080 <git_error_set>
   0x0000000000ee4bbd <+253>:	jmp    0xee4b79 <git_config_add_file_ondisk+185>
(gdb) disas fstatat64
Dump of assembler code for function fstatat64:
   0x0000000001274bf0 <+0>:	push   rbp
   0x0000000001274bf1 <+1>:	mov    rbp,rsp
   0x0000000001274bf4 <+4>:	mov    eax,0x106
   0x0000000001274bf9 <+9>:	mov    r10d,ecx
   0x0000000001274bfc <+12>:	syscall
   0x0000000001274bfe <+14>:	xor    ecx,ecx
   0x0000000001274c00 <+16>:	cmp    eax,0xfffff001
   0x0000000001274c05 <+21>:	jb     0x1274c18 <fstatat64+40>
   0x0000000001274c07 <+23>:	neg    eax
   0x0000000001274c09 <+25>:	mov    rcx,0xfffffffffffffff8
   0x0000000001274c10 <+32>:	mov    DWORD PTR fs:[rcx],eax
   0x0000000001274c13 <+35>:	mov    ecx,0xffffffff
   0x0000000001274c18 <+40>:	mov    eax,ecx
   0x0000000001274c1a <+42>:	pop    rbp
   0x0000000001274c1b <+43>:	ret

(errno = -ret is inlined?)

I think that's the source of the problem.

@NobodyXu
Copy link

NobodyXu commented Jun 9, 2024

Quickinstall uses cargo-zigbuild to cross compile to glibc 2.17, perhaps that's the souce of the error?

@mgattozzi
Copy link
Contributor

Like @senekor I've confirmed that if I install via cargo install jj-cli --locked this problem does not exist. My initial problem was with cargo binstall which will install the binaries from the release and if there are none default to cargo install

@thoughtpolice
Copy link
Collaborator

thoughtpolice commented Jun 12, 2024

Two questions:

  • If we created our own dynamically linked glibc builds, and published them in a release, would cargo binstall use those instead of the upstream zig version that's apparently broken?
  • Alternatively, is it possible to get cargo binstall to use the musl binary in the meantime? Or permanently, on all systems?

I ask because it would be nice of us to mitigate this for our users ASAP, so they don't run into it. We can/should also update our docs I suppose in the meantime, but I think it's important to stop the bleeding, so to speak.

In either case, I would prefer if cargo binstall would only install our binaries from our release archives by default, because that's what we CI and test against, and it would be really nice if that could work OOTB without any extra flags — so that users can get working results even if they mistype something or just run cargo binstall blindly without looking too hard.

@ilyagr
Copy link
Collaborator

ilyagr commented Jun 12, 2024

If we created our own dynamically linked glibc builds, and published them in a release, would cargo binstall use those instead of the upstream zig version that's apparently broken?

I believe so. Of course, then it would be us to make a glibc-linked binary that works on many linux systems, which I don't have much experience with, but you might.

Alternatively, is it possible to get cargo binstall to use the musl binary in the meantime? Or permanently, on all systems?
In either case, I would prefer if cargo binstall would only install our binaries from our release archibes by default, because that's what we CI and test against, and it would be really nice if that could work OOTB without any extra flags — so that users can get working results even if they mistype something or just run cargo binstall blindly without looking too hard.

AFAIU, there's no way currently, but this is now tracked in cargo-bins/cargo-binstall#1721.

@ilyagr
Copy link
Collaborator

ilyagr commented Jun 13, 2024

@NobodyXu
Copy link

@thoughtpolice

If we created our own dynamically linked glibc builds, and published them in a release, would cargo binstall use those instead of the upstream zig version that's apparently broken?

Yes.

Alternatively, is it possible to get cargo binstall to use the musl binary in the meantime? Or permanently, on all systems?

You can use --disable-strategies quick-install

@senekor
Copy link
Contributor Author

senekor commented Jun 13, 2024

Alternatively, is it possible to get cargo binstall to use the musl binary in the meantime? Or permanently, on all systems?

You can use --disable-strategies quick-install

I think the question is whether we can do this from the project's side. Asking every user to do this manually isn't reliable. I'm personally so used to binstall working well that I often don't check a projects installation documentation and just binstall it.

I do wonder: Can't you disable building jj and delete the broken release on the quickinstall side? Is it setup in a way that this is impossible or difficult?

@ilyagr
Copy link
Collaborator

ilyagr commented Jun 13, 2024

We actually have recommended the correct command in https://martinvonz.github.io/jj/latest/install-and-setup/#cargo-binstall for a long time, but I expect binstall users are less likely to read that page than an average user.

@NobodyXu
Copy link

I think the question is whether we can do this from the project's side. Asking every user to do this manually isn't reliable. I'm personally so used to binstall working well that I often don't check a projects installation documentation and just binstall it.

It's tracked in cargo-bins/cargo-binstall#1721 , though I'm quite busy recently.

I do wonder: Can't you disable building jj and delete the broken release on the quickinstall side? Is it setup in a way that this is impossible or difficult?

Yes, I can remove the broken release, though it might be built again.
I think it might be a bug in rust-cross/cargo-zigbuild#255 or ziglang, since cargo-quickinstall use it to build for glibc 2.17

@senekor
Copy link
Contributor Author

senekor commented Jun 13, 2024

Now that this PR is merged, the problem should go away with the next release of cargo-binstall. Of course, people will be running outdated versions for some time, so we might want to keep this issue pinned for a little longer.

@NobodyXu
Copy link

cargo-binstall 1.7.1 has released, cargo-bins/cargo-binstall#1741 which fixed this issue is released in 1.7.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants