blobfuse2 on AKS 1.25.5 does not support files with `Content-Encoding: gzip` #1100

eyedvabny · 2023-03-30T19:38:48Z

Which version of blobfuse was used?

blobfuse2 version 2.0.1

Which OS distribution and version are you using?

AKS 1.25.5 running AKSUbuntu-2204gen2containerd-2023.02.15 node image

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"

If relevant, please share your mount command.

blobfuse2 mount /var/lib/kubelet/plugins/kubernetes.io/csi/blob.csi.azure.com/40774c1cc34ceb8f8428e46f2de90f65dcd5f007fd997f8fc8e2f8c8ff924f60/globalmount --file-cache-timeout-in-seconds=120 -o negative_timeout=120 --use-attr-cache=true --virtual-directory=true --cache-size-mb=1000 -o allow_other --log-level=LOG_WARNING -o attr_timeout=120 -o entry_timeout=120 --pre-mount-validate=true --use-https=true --cancel-list-on-mount-seconds=10 --empty-dir-check=false --tmp-path= --container-name= --ignore-open-flags=true

What was the issue encountered?

blobfuse2 seems to be confused by files which have a Content-Encoding of gzip.

The files show up in the directory just fine, but any attempt to copy or move them results in an IO error:

This only impacts blobfuse-mounted directories. The file was downloadable from Azure using Storage Explorer and perfectly valid.

Have you found a mitigation/solution?

Removing or changing Content-Encoding seems to fix the behavior. No value other than gzip seems to trigger this issue - we tried deflate and utf-8 and base64 and even asdf. We ended up setting the Content-Type to application/x-gzip which is what Azure Storage Explorer sets it to when uploading a gzip file, and not setting Content-Encoding at all.

Please share logs if available.

Can't find any corresponding to the IO error. /var/log/blobfuse2 is clear (except not being able to find the du command, which is a known issue on AKS) and there's nothing related to IO in dmesg

The text was updated successfully, but these errors were encountered:

vibhansa-msft · 2023-04-03T05:45:02Z

As the content encoding is set to 'gzip' the SDK dependency that we have is trying to validate that file has valid 'gzip' headers. If it does not match then it will throw an error. As the file is 'csv' and 'gzip' encoding is set in type its failing. Either you can skip setting this type or use the 'x-gzip' as you have suggested.

eyedvabny · 2023-04-04T00:12:19Z

@vibhansa-msft The file is a valid gzip. The only "invalid" thing was the extension (it should have been .csv.gz instead of .csv), but I've reproduced it with a .gz-extensioned file too.

There is no content-type set, just content-encoding. As soon as I unset the content-encoding, both the file command and cp start working:

What does the underlying SDK use to determine that a file is valid gzip? Per the RFC that should just be the first two bytes x1f and x8b. Ironically when content-encoding is set to gzip I can't get any bytes of the file.

But when I remove the content encoding (the only change, no change to the file at all):

Content-Encoding: gzip is a valid encoding as long as the file itself is a valid gzip (intention being for the browser to auto-decompress) so this seems like a library error with blobfuse2.

FWIW it's also somewhat size-dependent. blobfuse2 doesn't exhibit this problem for any gzip file with Content-Encoding: gzip smaller than 140 bytes.

vibhansa-msft · 2023-04-04T07:30:06Z

We are looking at this and there is some misunderstanding here.
Can you share why do you need to store the content-encoding type here. Out observation here is if its set to gzip and file is read using browser or curl then data is decoded and delivered to application. However, the content-length reported HTTP headers in this case will still be length of compressed data (stored in container) and actual body reported back will be much larger (as its uncompressed data). This might be the issue we are hitting here. File created based on the content-length will be smaller and we will try to save much larger content in that.

eyedvabny · 2023-04-04T23:33:30Z

There's no requirement for using the Content-Encoding, it just caught us off guard. One of our teams had created the gzipped files with Content-Encoding of gzip and Content-Type of text/csv expressly for the reasons you described - so that when downloaded through the browser they automatically unpack. But when we tried to serve those same files through a blobfuse-mounted directory, we encountered the IO errors.

We've since switched to using Content-Type of application/x-gzip I mentioned when opening the issue. We don't get the auto-gunzip but it works with all of the file delivery methods, including email and blobfuse. I've raised this issue primarily to bring it to your attention since I've not seen Content-Encoding limitations documented anywhere in the repo. Thank you for addressing it so quickly.

vibhansa-msft · 2023-04-05T02:16:01Z

Thanks for clarifying this. We had a debug session yesterday to dig deep into this. By default Accept-Encoding is set to 'true' in HTTP requests that Blobfuse sends. This means if server has done any transport layer compression SDK will decompress and deliver the data back to application. In this case server is not doing any transport layer compression, however as the content-encoding is set on blob, server is forwarding the same in REST response. SDK assumes server has done compression and tries to decompress the data. As compression was not done at transport layer, the content-length that we get for the blob is size of zipped file (x-ms-content-length) and we try to read only that much data from SDK. As SDK went to decompress it, resultant data is of much larger size and this creates a problem in reading and we assume there is some corruption in data.
One solution to this problem is to disable Accept-Enconding being sent from Blobfuse and assume SDK will never decompress the data as we are asking server not to do any transport layer compression. This will work the way you want, blobfuse not decompressing the data and delivering the zip while browsers decompressing it and delivering original data. As our default blobfuse offering always has encoding set I am adding a new cli parameter using which you can disable this auto decompression feature and running blobfuse with that cli option shall solve your issue.

vibhansa-msft self-assigned this Apr 3, 2023

vibhansa-msft added V2 user-error labels Apr 3, 2023

vibhansa-msft added this to the V2-2.0.3 milestone Apr 3, 2023

vibhansa-msft added bug and removed user-error labels Apr 4, 2023

vibhansa-msft mentioned this issue Apr 4, 2023

Disable compression to be a configurable feature #1104

Merged

vibhansa-msft linked a pull request Apr 4, 2023 that will close this issue

Disable compression to be a configurable feature #1104

Merged

vibhansa-msft closed this as completed in #1104 Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blobfuse2 on AKS 1.25.5 does not support files with `Content-Encoding: gzip` #1100

blobfuse2 on AKS 1.25.5 does not support files with `Content-Encoding: gzip` #1100

eyedvabny commented Mar 30, 2023

vibhansa-msft commented Apr 3, 2023

eyedvabny commented Apr 4, 2023

vibhansa-msft commented Apr 4, 2023

eyedvabny commented Apr 4, 2023

vibhansa-msft commented Apr 5, 2023

blobfuse2 on AKS 1.25.5 does not support files with Content-Encoding: gzip #1100

blobfuse2 on AKS 1.25.5 does not support files with Content-Encoding: gzip #1100

Comments

eyedvabny commented Mar 30, 2023

Which version of blobfuse was used?

Which OS distribution and version are you using?

If relevant, please share your mount command.

What was the issue encountered?

Have you found a mitigation/solution?

Please share logs if available.

vibhansa-msft commented Apr 3, 2023

eyedvabny commented Apr 4, 2023

vibhansa-msft commented Apr 4, 2023

eyedvabny commented Apr 4, 2023

vibhansa-msft commented Apr 5, 2023

blobfuse2 on AKS 1.25.5 does not support files with `Content-Encoding: gzip` #1100

blobfuse2 on AKS 1.25.5 does not support files with `Content-Encoding: gzip` #1100