Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpine Linux Cannot Navigate Azure Files Mounts Reliably(SMB client issue in alpha image) #1325

Closed
GuyPaddock opened this issue Nov 19, 2019 · 20 comments

Comments

@GuyPaddock
Copy link

When using Azure Files with Alpine-Linux-based containers on AKS, you may observe strange behavior when applications attempt to navigate folders containing more than 62 files. In fact, commands like rm -rf from CLI will fail with rm: can't remove 'test': Directory not empty.

A copious amount of more information (including repro steps, environment, etc) is available here:
https://gitlab.alpinelinux.org/alpine/aports/issues/10960

I'm posting a link to this issue here for two reasons:

  1. To serve as a reference for other AKS users.
  2. To see if there is anything that Azure can do on the kernel side of things to address this issue in case musl does not.

Our nodes are currently running the following kernel version: 4.15.0-1063-azure #68-Ubuntu SMP Fri Nov 8 09:30:20 UTC 2019 x86_64 Linux.

With this version of Kubernetes:

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.7", GitCommit:"8fca2ec50a6133511b771a11559e24191b1aa2b4", GitTreeState:"clean", BuildDate:"2019-09-18T
14:47:22Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.12", GitCommit:"524c3a1238422529d62f8e49506df658fa9c8b8c", GitTreeState:"clean", BuildDate:"2019-11-14
T05:26:24Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}
@andyzhangx
Copy link
Contributor

could you provide more details about azure file? what's the azure file storage class? Have you tried using premium file?

@GuyPaddock
Copy link
Author

I would reference the linked thread for more details.

We're using Azure Files standard. Azure Files Premium breaks our cost structure due to the 100 TiB minimum quota requirement.

Per the musl team in the thread I linked to, the issue I'm reporting does not appear in Ubuntu containers using the GNU C library but does appear in Alpine-based containers using the MUSL C library. The root cause appears to be somewhere in the kernel -- it's caching data in some way that causes unexpected results when iterating-while-deleting 64+ files on a mounted NFS or SMB share.

The issue does not happen on GNU C because it uses much, much larger reads for directory iteration, which seems to effectively prevent the kernel from buffering the directory listing.

@smfrench
Copy link

smfrench commented May 7, 2020

This would be useful to get more information about. Many performance optimizations for metadata went in after the 4.18 kernel (especially around the 5.0 kernel) for SMB3 queries, but even on 4.15 there are a few obvious things that are worth trying, and it is also possible that this bug has been fixed in the last three years (and is in more recent kernels) but in the meantime can you try some potential workarounds? Have you tried setting mount option "actimeo=0" (to disable caching and see if the "can't remove directory" issue goes away)? The reverse, caching directory entries for longer periods of time (SMB3 defaults to a short cache lifetime for dentries of only 1 second) by e.g. setting actime=60 (instead of its default of one second) would be useful to see how that affected your workload.

In addition there are valid cases where "rm -rf" would fail. For example, if the application left open one of the files in that directory tree then it can not be deleted from the server until the file is closed (although it can be marked as to be deleted on close). NFS client on Linux can work around this with a strategy called 'silly-rename' and so if you do turn out to have this problem (ie the application forgot to close one or more of the files before deleting it), there may be a 'silly-rename' strategy on SMB3 client that we can add to cifs.ko to workaround the application problem in a similar way. One way to check if the "application forgot to close a file" is related to your problem is to do an "lsof +D " before you do "rm -rf" to ensure no files are open in that directory tree.

@GuyPaddock
Copy link
Author

In addition there are valid cases where "rm -rf" would fail. For example, if the application left open one of the files in that directory tree then it can not be deleted from the server until the file is closed (although it can be marked as to be deleted on close).

Per the info in this article, we are able to demonstrate rm -rf failing in a single bash script that uses no file locks and no concurrency:
https://gitlab.alpinelinux.org/alpine/aports/issues/10960

@GuyPaddock
Copy link
Author

I tried actimeo=0 back in October and all it did was result in massive throttling from Azure Files that eventually caused the mount to fall off (similar to #1587).

@smfrench
Copy link

smfrench commented May 7, 2020

In addition there are valid cases where "rm -rf" would fail. For example, if the application left open one of the files in that directory tree then it can not be deleted from the server until the file is closed (although it can be marked as to be deleted on close).

Per the info in this article, we are able to demonstrate rm -rf failing in a single bash script that uses no file locks and no concurrency:
https://gitlab.alpinelinux.org/alpine/aports/issues/10960

I tried an experiment just now with this old Ubuntu kernel (4.15) mounted with SMB3 to Azure and didn't see a problem with either 256 or 2048 files (see below) using the test script mentioned earlier in the post. To reproduce this problem may require a more complex setup with containers (or perhaps a very old kernel missing some fixes?).

root@smf-old-ubuntu:/mnt/smftestshares# ~/test.sh 256
Creating '256' test files...

Trying to delete test files...
DELETED: 257 BEFORE: 256 AFTER: 0

root@smf-old-ubuntu:/mnt/smftestshares# ls
root@smf-old-ubuntu:/mnt/smftestshares# ~/test.sh 2048
Creating '2048' test files...

Trying to delete test files...
DELETED: 2049 BEFORE: 2048 AFTER: 0

root@smf-old-ubuntu# uname -a
Linux smf-old-ubuntu 4.15.0-1082-azure #92~16.04.1-Ubuntu SMP Tue Apr 14 22:28:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

@GuyPaddock
Copy link
Author

@smfrench: As I mentioned, the bug is not reproducible on Ubuntu because it uses GNU Standard C instead of MUSL C. GNU C works around the kernel bug by doing large reads to avoid caching; MUSL does small reads.

You would need to try this on an Alpine container.

@andyzhangx

This comment has been minimized.

@andyzhangx
Copy link
Contributor

I tried on a AKS 1.17.3 cluster, could not repro, use the exact same steps as https://gitlab.alpinelinux.org/alpine/aports/issues/10960

# k get no -o wide
NAME                                STATUS   ROLES   AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-agentpool-22126781-vmss000002   Ready    agent   21m   v1.17.3   10.240.0.4    <none>        Ubuntu 16.04.6 LTS   4.15.0-1077-azure   docker://3.0.10+azure
aks-agentpool-22126781-vmss000003   Ready    agent   21m   v1.17.3   10.240.0.5    <none>        Ubuntu 16.04.6 LTS   4.15.0-1077-azure   docker://3.0.10+azure

# k get po test-shares-bdf6d7956-x9mcd -o wide
NAME                          READY   STATUS    RESTARTS   AGE     IP           NODE                                NOMINATED NODE   READINESS GATES
test-shares-bdf6d7956-x9mcd   1/1     Running   0          3m15s   10.244.3.3   aks-agentpool-22126781-vmss000002   <none>           <none>

# k exec -it test-shares-bdf6d7956-x9mcd sh
/ # vi test.sh
/ # chmod 0755 test.sh
/ # ./test.sh 128
Creating '128' test files...

Trying to delete test files...
DELETED: 129  BEFORE: 128  AFTER: 0

/ # ./test.sh 128
Creating '128' test files...

Trying to delete test files...
DELETED: 129  BEFORE: 128  AFTER: 0

/ # ./test.sh 128
Creating '128' test files...

Trying to delete test files...
DELETED: 129  BEFORE: 128  AFTER: 0

I was wrong, it could repro, I forgot to cd /var/www/html/data/ in previous experiment:

# k exec -it test-shares-bdf6d7956-skq8r sh
/ # cd /var/www/html/data/
/var/www/html/data # vi test.sh
/var/www/html/data # chmod 0755 test.sh
/var/www/html/data # ./test.sh 128
Creating '128' test files...

Trying to delete test files...
DELETED: 66  BEFORE: 128  AFTER: 62
DELETED: 63  BEFORE: 62  AFTER: 0

/var/www/html/data # ./test.sh 128
Creating '128' test files...

Trying to delete test files...
DELETED: 66  BEFORE: 128  AFTER: 62
DELETED: 63  BEFORE: 62  AFTER: 0

@andyzhangx
Copy link
Contributor

andyzhangx commented May 8, 2020

Here is the way how to repro this issue on your local environment, it’s directly related to SMB client issue(don’t need AKS cluster):

mkdir /tmp/test
sudo mount -t cifs //accountname.file.core.windows.net/test /tmp/test -o vers=3.0,username= accountname,password=…,dir_mode=0777,file_mode=0777,cache=strict,actimeo=30

wget -O /tmp/test/test.sh https://raw.githubusercontent.com/andyzhangx/demo/master/debug/test.sh
docker run -it -v /tmp/test:/var/www/html/data/ --name alpine alpine:3.10 sh

# cd /var/www/html/data/
/var/www/html/data # ./test.sh 128
Creating '128' test files...

Trying to delete test files...
DELETED: 66  BEFORE: 128  AFTER: 62
DELETED: 63  BEFORE: 62  AFTER: 0

We are already looping SMB experts to take a look at this issue.

Also, Same result on AKS node Ubuntu 18.04 5.0.0-1036-azure running with alpine:3.10 image:

./test.sh 128
Creating '128' test files...

Trying to delete test files...
DELETED: 66  BEFORE: 128  AFTER: 62
DELETED: 63  BEFORE: 62  AFTER: 0

@andyzhangx andyzhangx changed the title Alpine Linux Cannot Navigate Azure Files Mounts Reliably Alpine Linux Cannot Navigate Azure Files Mounts Reliably(SMB client issue in alpha image) May 8, 2020
@andyzhangx
Copy link
Contributor

andyzhangx commented May 8, 2020

while by using ubuntu:16.04 image, it's working as expected:

# docker run -it -v /tmp/test:/var/www/html/data/ --name ubuntu ubuntu:16.04 sh
# cd /var/www/html/data/
# ./test.sh 128
Creating '128' test files...

Trying to delete test files...
DELETED: 129  BEFORE: 128  AFTER: 0

@github-actions
Copy link

Action required from @Azure/aks-pm

@ghost
Copy link

ghost commented Jul 26, 2020

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Jul 26, 2020
@palma21
Copy link
Member

palma21 commented Jul 27, 2020

@VybavaRamadoss @RenaShahMSFT could you help?

@smfrench
Copy link

smfrench commented Jul 27, 2020

When debugged this back in May, didn't this show the bug in the Alpine version of ls not in the network fs client(s)? The SMB3 (and presumably nfs client as well) was returning the expected files, and delete worked fine but the Alpine library (unlike libc library called by ls) had a bug. It seemed to be related to Alpine library not restarting the search properly after changing the directory contents after removing some of the files in the middle of doing a directory search.

@GuyPaddock
Copy link
Author

@smfrench No, the issue is that there is a kernel bug that GNU LibC avoids by doing greedy/large reads. Alpine does smaller reads of directory listings to limit memory consumption. So, it is more accurate to say that Alpine does not work around the kernel bug while GNU LibC does. But it is hard to say whether GNU was aware that they were working around the bug or whether it was just coincedental.

@palma21 palma21 removed Needs Attention 👋 Issues needs attention/assignee/owner action-required labels Aug 6, 2020
@palma21
Copy link
Member

palma21 commented Aug 6, 2020

Could you confirm if this issue should still be open then if it's specific to Alpine?

@ghost ghost added the stale Stale issue label Oct 5, 2020
@ghost
Copy link

ghost commented Oct 5, 2020

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

@ghost ghost closed this as completed Oct 21, 2020
@ghost
Copy link

ghost commented Oct 21, 2020

This issue will now be closed because it hasn't had any activity for 15 days after stale. GuyPaddock feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.

@sprasad-microsoft
Copy link

sprasad-microsoft commented Oct 21, 2020

I did some digging into this issue and also discussed this in the linux-cifs mailing list.

The documentation on readdir reads (https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir_r.html):
If a file is removed from or added to the directory after the most
recent call to opendir() or rewinddir(), whether a subsequent call to
readdir() returns an entry for that file is unspecified.

So the different filesystems are left free to choose their own behaviour when this happens.
cifs.ko (the Linux SMB client) makes sure that it's not returning stale data, at the cost of missing some entries for this particular use case. It just so happens that the way ext4 handles this is to reposition the dir offset back to 0. However, with that, ext4 could end up emitting duplicate entries during successive readdirs, in case of changes to the directory. Either way, there can be issues.

This issue is seen quite often in Alpine, because it uses musl libc, which seems to send much smaller buffers down to VFS to read the dirents into.

However, the main issue here is the implementation of rm used here (I don't know if this is the default GNU version of coreutils). It depends on the undefined behaviour of Linux VFS, where it should not. When doing recursive readdirs (where it knows that the directory has changed), it should rewind back to position 0 and start the next readdir again. This way, the problem can be fixed; and to me, that sounds like the right way to fix this problem.

@ghost ghost removed the stale Stale issue label Oct 21, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Nov 20, 2020
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants