-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azurefile is slow on read/write large mount of small files #223
Comments
I see my work! 👏 Anyways, I think So, it's slow, I agree, but I'm not sure that's an AKS problem so much as the fundamental service is slow 😢 |
If it is general problem with azure file then please put a big red warning in AKS azurefile docs. Suggesting an alternative that would allow spreading pods over multiple nodes would be also nice - unfortunately the only thing I can think of is setting up |
Thanks for the feedback. /cc @neilpeterson for any doc guidance on Azure Files perf. |
just a heads up, azure files premium doesnt fix this |
current azure premium file is still slow when read/write large number of small files, azure file team are working on this. I will update this issue as soon as wordpress could run well on azure premium file |
This comment has been minimized.
This comment has been minimized.
is this with many small files? |
@4c74356b41 yes, wordpress has more than 1K small files, and both mariadb and wordpress pod could run on azure premium file now, see above example. And also azure file team are improving the perf besides this premium file feature. It's quite promising. |
what about reasonable share size? 5tb for wordpress site seems a bit overkill? |
also, 100gb was the smallest size back in november, which is ridiculous as well |
Does anyone know if the storage account version makes any difference? |
@moebius87 No, I don't think so, azure file on storage account type (Premium_LRS) is faster. |
@andyzhangx K8s cluster 1.13.x is still not available for an upgrade on AKS. How did you make it work? I tried Premium_LRS on a 1.11 K8s basically it will not work because this link states that |
@jtasipit you can use this approach: https://4c74356b41.com/post5779 |
@4c74356b41 I tried your suggested approach but encountered same error. The prestaged premium SA is not being used by the pvc when created for no known reason. |
you probably did something wrong, it worked fine for me |
Azure Premium file dynamic provision support is available from k8s v1.13.0 |
This problem is also with the Premium SKU. It's a little bit faster .. but not useable as pod storage. So AKS is not useful for small workloads due the limitation of data disks (4 or 8 for small nodes) and the slow azure files service |
Thanks for the clarification @chris3081. I'm really sad that your support ticket didn't get you the answer more promptly. Glad to hear that Rook Ceph is working for you. For your info, the main thing I'm focusing my own attention on in this thread is the Azure NetApp Files option - simply because it seems the most promising option for those customers who do need things that @jabbera mentioned. If we can find answers to the Azure NetApp Files issues raised above, then I think we'll be in a much better position because users who need faster small file perf will have two choices: (a) Manage their own Rook Ceph solution (or similar) like you are doing or (For completeness, there's actually a third solution in some cases, which is to structure your application logic to access mulitple files in parallel. That gets better performance out of Azure Files than dealing with the files sequentially one at a time. But that's not an option for everyone. Also, I can't recall exactly how fast you can get with this type of solution. My guess would be "noticably faster than the sequential solution to Azure files, but not as fast as Rook Ceph or Azure NetApp Files". And, as a fourth option, simply moving to Azure Premium Files will be enough in some cases) For customers such as yourself, where Rook Ceph is suitable, I think the main thing we need to do its make that solution more discoverable. Finally, you wrote "Our other issues are all around the CNI.". Can you refresh my memory - are those issues covered in this thread? In some other GitHub issue? |
@JohnRusk we did review the netapp solution too but we felt there was a lot of waste when you have a minimum deployment size of 4TB, For us our deployments are over multiple regions so this gets expensive quickly otherwise it was a promising solution. That said I think the Rook Ceph solution could be made to be AKS managed and integrated into AKS deployments. |
Agree with @chris3081, the 4TB minimum kills NetApp Files for me. @JohnRusk Do you know why this minimum exists? Is it purely a business decision? |
I don't know @mahgo. I'm not involved in Azure NetApp Files. Before working in AKS, I used to work in data transfer to Storage (both Blob and File) but that was before Azure NetApp Files existed. |
FYI there is a great guide to know the root cause of the latency here: https://docs.microsoft.com/en-us/azure/storage/files/storage-troubleshooting-files-performance#cause-3-single-threaded-application Azurefile is a storageaccount service limited for different factors and the final application is a factor as well. For example, it's diferent to compare a single-threaded application (1 uniq wordpress server reading files) than multi-threaded application like comment here:
But another factor is the IOPs throuput of your disk, which is related with your amount of disk size. + size GB -> + IOPs Just a reflection... Because it's possible to use alternatives storage as a Service like data lake storage gen 2 : https://docs.microsoft.com/en-us/azure/storage/blobs/upgrade-to-data-lake-storage-gen2 I am not in favor of always select a service premium storage if you can find the balance using a minimal architecture, but I think depending of critical impact of your business it is good to consider if improve your architecture (more disk size to obtain more IOPS, more VMs, VMs with better CPUs) is adequate or it is necessary to have a better service storage with a upgrade procedure to the data access (using parallel reads/writes, using indexing data...) |
I've gone as big as they will let me get the maximum iops. The performance pales in comparison to a native nfs share. |
Have you tested with data lake storage gen 2 storage account?
|
No. It's not the example given in the documentation. I can give it a go, but it's not easy to mount a gen 2 account on a windows machine. (We have a need to access these files outside of the cluster). |
For AKS the better option is: https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv Latency is acceptable writing from different pods on same pvc: 4 KiB <<< /mnt/azure (cifs //f850d0538e84d46c1a7178f.file.core.windows.net/pvc-cd2968cb-3090-4fbd-aef6-30e472d304cb): request=11 time=3.37 ms (fast) All pods can use the same PVC. It's similar to StorageAccount Filestore with fileshare, but you can create dynamically with kubectl using the new StorageClass described in the before article. You only lose the functionality to browse files via grafical interface (via Azure Explorer), but is a normal PVC so you can browse files via terminal. Another alternative in kuberentes is to use HostPath to share folder across different pods/dep. This option has minor latency (microseconds) but became to a bottleneck when you write from differents pods on the same hostpath (it obtains latency peaks in seconds instead of microseconds.) |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
not stale |
Same issue here. Trying to setup a cluster for CMS based SaaS software (Joomla). Lots of small PHP files. Despite having raised the premium azure file to 5Tb (and therefore increased IOPS), and having activated SMB multichannel, pages still take between 10 up to 20 seconds to show up. Deceiving... |
I think next option is Ultra disk and share across different VMs..Or use kubernetes in dynamic provisioning with UltradiskSSD has DiskIOPSReadWrite param to select the performance transactions per seconds. For example if only select size storage, these kind of disks start with more IOPS per GB .. |
Agree. It might be a working solution which deserves a try. However, I go on reading docs and testimonials. Maybe the solution lies in azure file share AND azure file sync, to keep a local cached copy of static files. |
@jmorcar what kind of file system would be used on a shared ultrassd? I don't know many cluster capable file systems that are k8s compatible. |
Here there is an example with pacemaker in containers... So I agree with you, I think pacemaker is better in VMs architecture instead k8s, but only to test an architecture of 30000 IOPs per disk, shared with Ultradisk... it could be interesting. |
Azure file sync will not work in our case to speed up file loading times over network share as it is intented to be used in Windows server. |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
Not stale |
Trying to implement a shared storage for webhosting, as Azure Premium is recommended for in the link provided below, we have ruled it completely unusable. Single large file performance is good, small files you can forget about. https://learn.microsoft.com/en-us/azure/storage/files/storage-files-planning |
for the wordpress like scenario, pls try with
I will close this issue, let me know if you have any other questions. |
This isn't a viable solution. Because you can't use ZFS outside of a subscription, you can't create fail over regions that share data files etc. on ZFS. (presumably because ZFS has no security) We need a solution that allows for k8s clusters to be in multiple regions across multiple subscriptions for availability and be able to share storage. |
While attempting to setup AKS cluster for number of wordpress sites I've noticed that
azurefile
is unbearably slow when working with many files. Installing chartstable/wordpress
takes up to 20 minutes due to copyingwp-content
with themes and few plugins into persistent volume (with default configuration kubernetes will kill the pod before install finishes). When usingazuredisk
everything is done in few minutes. This doesn't affect only installation, but also working with wp - for instance installing most new plugins takes several minutes.WP is not the only one suffering: http://unethicalblogger.com/2017/12/01/aks-storage-research.html
From tests below it seems that the issue occurs when working with many files whereas when just writing one large file the speed difference is acceptable.
If this is hard technical limitation it should be noted in documentation.
time ( wget -qO- https://wordpress.org/latest.tar.gz | tar xvz -C ./wp )
time ( cp -R --verbose wp wpcopy )
sync; time ( dd if=/dev/zero of=testfile bs=100k count=1k && sync ); rm testfile
The text was updated successfully, but these errors were encountered: