Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blobfuse2 on AKS 1.25.5 does not support files with Content-Encoding: gzip #1100

Closed
eyedvabny opened this issue Mar 30, 2023 · 5 comments · Fixed by #1104
Closed

blobfuse2 on AKS 1.25.5 does not support files with Content-Encoding: gzip #1100

eyedvabny opened this issue Mar 30, 2023 · 5 comments · Fixed by #1104
Assignees
Labels
Milestone

Comments

@eyedvabny
Copy link

Which version of blobfuse was used?

blobfuse2 version 2.0.1

Which OS distribution and version are you using?

AKS 1.25.5 running AKSUbuntu-2204gen2containerd-2023.02.15 node image

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

If relevant, please share your mount command.

blobfuse2 mount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/40774c1cc34ceb8f8428e46f2de90f65dcd5f007fd997f8fc8e2f8c8ff924f60/globalmount --file-cache-timeout-in-seconds=120 -o negative_timeout=120 --use-attr-cache=true --virtual-directory=true --cache-size-mb=1000 -o allow_other --log-level=LOG_WARNING -o attr_timeout=120 -o entry_timeout=120 --pre-mount-validate=true --use-https=true --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path= --container-name= --ignore-open-flags=true

What was the issue encountered?

blobfuse2 seems to be confused by files which have a Content-Encoding of gzip.
image

The files show up in the directory just fine, but any attempt to copy or move them results in an IO error:
image

This only impacts blobfuse-mounted directories. The file was downloadable from Azure using Storage Explorer and perfectly valid.

Have you found a mitigation/solution?

Removing or changing Content-Encoding seems to fix the behavior. No value other than gzip seems to trigger this issue - we tried deflate and utf-8 and base64 and even asdf. We ended up setting the Content-Type to application/x-gzip which is what Azure Storage Explorer sets it to when uploading a gzip file, and not setting Content-Encoding at all.

Please share logs if available.

Can't find any corresponding to the IO error. /var/log/blobfuse2 is clear (except not being able to find the du command, which is a known issue on AKS) and there's nothing related to IO in dmesg

@vibhansa-msft
Copy link
Member

As the content encoding is set to 'gzip' the SDK dependency that we have is trying to validate that file has valid 'gzip' headers. If it does not match then it will throw an error. As the file is 'csv' and 'gzip' encoding is set in type its failing. Either you can skip setting this type or use the 'x-gzip' as you have suggested.

@vibhansa-msft vibhansa-msft self-assigned this Apr 3, 2023
@vibhansa-msft vibhansa-msft added this to the V2-2.0.3 milestone Apr 3, 2023
@eyedvabny
Copy link
Author

@vibhansa-msft The file is a valid gzip. The only "invalid" thing was the extension (it should have been .csv.gz instead of .csv), but I've reproduced it with a .gz-extensioned file too.
image
There is no content-type set, just content-encoding. As soon as I unset the content-encoding, both the file command and cp start working:
image

What does the underlying SDK use to determine that a file is valid gzip? Per the RFC that should just be the first two bytes x1f and x8b. Ironically when content-encoding is set to gzip I can't get any bytes of the file.
image
But when I remove the content encoding (the only change, no change to the file at all):
image

Content-Encoding: gzip is a valid encoding as long as the file itself is a valid gzip (intention being for the browser to auto-decompress) so this seems like a library error with blobfuse2.

FWIW it's also somewhat size-dependent. blobfuse2 doesn't exhibit this problem for any gzip file with Content-Encoding: gzip smaller than 140 bytes.

@vibhansa-msft
Copy link
Member

We are looking at this and there is some misunderstanding here.
Can you share why do you need to store the content-encoding type here. Out observation here is if its set to gzip and file is read using browser or curl then data is decoded and delivered to application. However, the content-length reported HTTP headers in this case will still be length of compressed data (stored in container) and actual body reported back will be much larger (as its uncompressed data). This might be the issue we are hitting here. File created based on the content-length will be smaller and we will try to save much larger content in that.

@eyedvabny
Copy link
Author

There's no requirement for using the Content-Encoding, it just caught us off guard. One of our teams had created the gzipped files with Content-Encoding of gzip and Content-Type of text/csv expressly for the reasons you described - so that when downloaded through the browser they automatically unpack. But when we tried to serve those same files through a blobfuse-mounted directory, we encountered the IO errors.

We've since switched to using Content-Type of application/x-gzip I mentioned when opening the issue. We don't get the auto-gunzip but it works with all of the file delivery methods, including email and blobfuse. I've raised this issue primarily to bring it to your attention since I've not seen Content-Encoding limitations documented anywhere in the repo. Thank you for addressing it so quickly.

@vibhansa-msft
Copy link
Member

Thanks for clarifying this. We had a debug session yesterday to dig deep into this. By default Accept-Encoding is set to 'true' in HTTP requests that Blobfuse sends. This means if server has done any transport layer compression SDK will decompress and deliver the data back to application. In this case server is not doing any transport layer compression, however as the content-encoding is set on blob, server is forwarding the same in REST response. SDK assumes server has done compression and tries to decompress the data. As compression was not done at transport layer, the content-length that we get for the blob is size of zipped file (x-ms-content-length) and we try to read only that much data from SDK. As SDK went to decompress it, resultant data is of much larger size and this creates a problem in reading and we assume there is some corruption in data.
One solution to this problem is to disable Accept-Enconding being sent from Blobfuse and assume SDK will never decompress the data as we are asking server not to do any transport layer compression. This will work the way you want, blobfuse not decompressing the data and delivering the zip while browsers decompressing it and delivering original data. As our default blobfuse offering always has encoding set I am adding a new cli parameter using which you can disable this auto decompression feature and running blobfuse with that cli option shall solve your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants