Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL pins opened directories #1529

Closed
therealkenc opened this issue Dec 21, 2016 · 20 comments
Closed

WSL pins opened directories #1529

therealkenc opened this issue Dec 21, 2016 · 20 comments
Assignees
Labels

Comments

@therealkenc
Copy link
Collaborator

therealkenc commented Dec 21, 2016

Pulling this out of #1492. It is possibly a dup of #1420; but I'm thinking maybe not because that issue has no lingering fds. This hits in spades with nfs-ganesha, because the protocol is stateless and file descriptors are cached for a while before being released. Windows Explorer traipses all over thousands of files, causing lots of directory fds to remain open. This in turn causes unexpected mv and rm operations on seemingly random directories on the WSL side to fail.

This will probably also show itself when people start doing web development scenarios, because http servers have a tendency to do the same sort of fd caching.

// silly-rename.c
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, const char* argv[]) {
  const char msg[] = "amy's parent morton renamed to maura\n";
  int fd, ret;

  mkdir("/tmp/morton", 0777);
  fd = open("/tmp/morton/amy", O_CREAT|O_RDWR, 0666);
  ret = rename("/tmp/morton/", "/tmp/maura/");
  if (ret < 0) {
    perror("rename failed: ");
  } else {
    write(fd, msg, sizeof(msg) - 1);
  }
  close(fd);
}

strace on WSL:

mkdir("/tmp/morton", 0777)              = 0
open("/tmp/morton/amy", O_RDWR|O_CREAT, 0666) = 3
rename("/tmp/morton/", "/tmp/maura/")   = -1 EACCES (Permission denied)

strace on Ubuntu:

mkdir("/tmp/morton", 0777)              = 0
open("/tmp/morton/amy", O_RDWR|O_CREAT, 0666) = 3
rename("/tmp/morton/", "/tmp/maura/")   = 0
@sunilmut
Copy link
Member

sunilmut commented Jul 7, 2017

@therealkenc - Apologize for the delay here. I am sure there are other neglected issues, and I personally would like to get through them too.

Anyways, I think this stems from the NTFS limitation that prevents renaming directories which have a handle open to anything below. @SvenGroot to confirm.

@seffyroff
Copy link

Other than 'try to not open lots of files' is there any advice on avoiding this issue, given that a fix seems to be escaping us? Is there a way to monitor and close file handles? Is there a more suitable filesystem for hosting WSL filesystems that would avoid hitting this? Is there a node setting we can flag to ignore these errors, given that they're apparently false positives?

@bitcrazed
Copy link
Contributor

bitcrazed commented Dec 19, 2018

Assigned to @SvenGroot and @tara-raj to take a look, though with Xmas fast approaching, may be a couple of weeks until this gets looked into.

Interested party here: https://twitter.com/rainabba/status/1075192299791908864

@therealkenc
Copy link
Collaborator Author

therealkenc commented Dec 19, 2018

There has been chirping crickets since 2016, is is fairly safe to assume, mostly because the problem is understood and there isn't much for Sven or Ben or Brian to add. The problem is caused by underlying limitations in NTFS and the NT APIs you have to work with. I posted an outline of a possible (albeit nontrivial) solution elsewhere, of which they've almost certainly been aware since 2016 as well. There isn't anything new to "look into" here.

@therealkenc
Copy link
Collaborator Author

Tell you what you can do, if you are looking for something constructive. Right now it is impossible to differentiate between EACCES caused by WSL failing to respect Unix filesystem semantics, versus folks who simply have their permissions wrong (EUSERCONFUSED). You know up in that rename(2) the difference, because you know whether the mode bits (what we tend to call metadata) are okay or not. In the case you can't rename because NTFS won't play ball, return a nonstandard EBUSY instead of EACCES, which will at least stick out like a sore thumb. That way reported issues triggering the known problem can be summarily duped here instead of guessing. This was asked for about a year ago but could have been easily buried versus rejected. Just a thought.

@onomatopellan
Copy link

@Coder-256 it's fixed in WSL2.

@therealkenc
Copy link
Collaborator Author

therealkenc commented Jun 25, 2019

@Coder-256 it's fixed in WSL2.

For a good swath of folk's major pain points, yes. The underlying issue remains however. With minor string edits to OP:

// silly-rename.c
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, const char* argv[]) {
  const char msg[] = "amy's parent morton renamed to maura\n";
  int fd, ret;

  mkdir("/mnt/c/Users/there/morton", 0777);
  fd = open("/mnt/c/Users/there/morton/amy", O_CREAT|O_RDWR, 0666);
  ret = rename("/mnt/c/Users/there/morton/", "/mnt/c/Users/there/maura/");
  if (ret < 0) {
    perror("rename failed: ");
  } else {
    write(fd, msg, sizeof(msg) - 1);
  }
  close(fd);
}

and the strace for pedantry:

mkdir("/mnt/c/Users/there/morton", 0777) = -1 EEXIST (File exists)
openat(AT_FDCWD, "/mnt/c/Users/there/morton/amy", O_RDWR|O_CREAT, 0666) = 3
rename("/mnt/c/Users/there/morton/", "/mnt/c/Users/there/maura/") = -1 EACCES (Permission denied)
[...]
write(4, "rename failed: : Permission deni"..., 35) = 35

Which would seem academic, except that it affects everyone using WSL for WSL's raison d'être, which is interop with Windows.

@snevs
Copy link

snevs commented Aug 28, 2019

why is [Install Remote WSL fail with mv - permission denied #109](https://github.com/microsoft/vscode-remote-release/issues/109) even closed? I have the same issue with mv

@slikts
Copy link

slikts commented Dec 27, 2019

The problem is caused by underlying limitations in NTFS and the NT APIs you have to work with.

So it's an another facet of the same root problem as #873 and why WSL2 is being added.

… it's fixed in WSL2.

It's not as much fixed in WSL2 as WSL2 is a fundamentally different tool with its own set of significant limitations (deriving from virtualization). Specifically, Hyper-V is not compatible with at least a couple of tools I use, and it also adds significant overhead by running Windows as a guest.

@DemiMarie
Copy link

DemiMarie commented Dec 27, 2019 via email

@amanhigh
Copy link

Facing Similar Issue

go get -u github.com/PuerkitoBio/goquery
go: extracting github.com/PuerkitoBio/goquery v1.5.1
go get: rename /home/aman/go/pkg/mod/github.com/!puerkito!bio/goquery@v1.5.1.tmp-274661216 /home/aman/go/pkg/mod/github.com/!puerkito!bio/goquery@v1.5.1: permission denied

Permissions

~/go:  ls
total 0
drwxr-xr-x 1 aman aman 512 Mar 15 02:43 ..
drwxrwxrwx 1 aman aman 512 Mar 15 02:46 .
drwxrwxrwx 1 aman aman 512 Mar 15 02:46 src
drwxrwxrwx 1 aman aman 512 Mar 15 09:53 pkg

@clshortfuse
Copy link

clshortfuse commented Mar 23, 2020

I have a hypothesis that may or may not help illuminate the issue.

I have a machine that uses WSL v1 (no issues on v2). I can replicate EACCES pretty easily by trying to install a package with VSCode. I believe there's some sort of access limit or open handles.

Basically, I know VSCode is monitoring the folder I'm working in. If I install a package with VSCode is open, I'll get EACCES errors. If I close VSCode and install the packages over the Ubuntu bash, no errors.

So, to create this issue, I believe you just have to:

  • Initiate some sort of file system watcher on a folder (VSCode)
  • Create and delete files within the same folder very quickly (npm install)

I think what's causing the EACCES issue is that VSCode isn't handling the file system changes faster than npm can create or delete them. That's why this only happens sometimes and on some machines. The machine I'm working on currently is REALLY slow in general and I get this errors consistently if I try it installing a package with VSCode open. WSLv2 may have fixed this issue, or it's just so much faster than WSLv1 that the it's rare for the collision to occur. The fact that the issue goes away when I close VSCode means that the handle is being kept open by the file system watcher.

I'm sure somebody smarter than I can create some sort of test script that creates and deletes a bunch of files with a watcher in the background. You might find the collision. I just don't know where in WSL this issue might occur. Perhaps inotify is bugged somewhere?

Edit: Worth noting, I don't use /mnt/* so it probably has little to do with the Windows drives.

stefangraber added a commit to stefangraber/asdf-maven that referenced this issue Mar 26, 2020
ManasJayanth added a commit to esy/esy that referenced this issue Jan 6, 2022
Retry rename on EACCES

This PR retries `rename` upon getting `EACCES`. I've included data about how many retries are likely necessary.

On my system (WSL1 Ubuntu 20.04, omitting hardware details), the `EACCES` issue makes it impossible to use esy to install any of the Dream examples or use Dream's quick start. As the data below shows, every installation is expected to fail with `EACCES`, if it is not worked around.

The underlying `EACCES` issue seems to be a long-standing problem on WSL1 (microsoft/WSL#1529, microsoft/WSL#3395), and I think we do have to work around it in esy. I'm not sure what is causing the `EACCES` exactly. I think there are two main classes of possibilities:

- Self-interaction between esy's opened file descriptors and `rename`. I think the self-interaction is due to WSL rather than Lwt or another library. Since I compiled esy on WSL, it is using Lwt's Unix (rather than Windows) C code. Since the Unix code seems to work fine on Linux and Mac, this suggests a WSL issue.
- Interaction between esy and file indexers or other proceses running on the system. I'm not sure if that's a WSL issue or not, but I've never had to be aware of such processes when doing renames in Cygwin or elsewhere.

I built esy with this patch under WSL and ran clean `esy install`s in Dream's [full-stack ReScript](https://github.com/aantron/dream/tree/03e4d37cb5f5f638707479cd46105e2ee2b1df0e/example/w-fullstack-rescript#readme) example, using this script:

```sh
#!/bin/bash

export PATH="/home/antron/code/attic/esy/_build/install/default/bin:$PATH"
export ESY__PREFIX="/home/antron/code/dream/dream/example/w-fullstack-rescript/esy-prefix"
export OCAMLRUNPARAM=b
RUN=1

while true
do
  rm -rf esy-prefix _esy esy.lock lib node_modules/ package-lock.json
  echo
  echo "RUN $RUN"
  which esy
  esy install # --verbosity debug
  if [ $? != 0 ]
  then
    exit
  fi
  RUN=$((RUN+1))
done
```

The example was checked out into NTFS. The system was freshly restarted, and VSCode (or anything similar) was not running.

I used a version of this patch with a print showing the number of attempts before `rename` succeeds, and got the following results from 5 runs:

```
1 attempt:   802
2 attempts:   52
3 attempts:   12
4 attempts:    3
5 attempts:    1
total:       870
```

Based on this, I naively estimated that if a `rename` needs more than 1 attempt, the number of attempts needed decays by a factor of 4 at each step. I set the limit on the number of attempts naively to 8, thus expecting one failure in about 500 `esy install` attempts of the Dream ReScript example, under all these simplified assumptions.

The delay between attempts is (over) one second, so this means that upon legitimate `EACCES`, users will have to wait eight seconds to get an error message. I think there are two ways to address this:

- Fall back to recursive copy rather than retrying `rename` when `rename` fails. Do we have a recursive copy available in esy or its dependencies? Is it fine to leave the source directory intact?
- Detect WSL and retry only on WSL. Waiting for 8 seconds is still a much better user experience than failure to install at all, so the PR will still be an improvement, without, in this case, harming Linux or Mac users. We could also add a message, shown in case we finally fail with `EACCES` on WSL, giving users a hint about potential VSCode or other watchers, and what else they can try to solve the problem.

Closes #1363.
Probably fixes #1097, some of the reports after the first one.
Probably fixes #1083.
Probably fixes #593, but I haven't looked into non-WSL Windows yet.
Probably fixes aantron/dream#63.

cc @bryphe, @rizo, @jordwalke, @iMplode-nZ, @a-c-sreedhar-reddy, @srirajshukla, @andreypopp
Copy link
Contributor

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests