Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for being able to detect a brotli-compressed file? #867

Closed
juj opened this issue Nov 10, 2020 · 5 comments
Closed

Add support for being able to detect a brotli-compressed file? #867

juj opened this issue Nov 10, 2020 · 5 comments

Comments

@juj
Copy link

juj commented Nov 10, 2020

Identifying a gzip-compressed file is super-easy by checking whether the file starts with magic bytes 1F and 8B. (ignoring false positives etc.)

It has been discussed before that Brotli is a stream that does not have such magic bytes. That has led to other projects not being able to move forward, such as file and libmagic tools not being able to detect Brotli compressed content (#727).

There was a proposal for a Brotli framing format at https://github.com/madler/brotli/blob/master/br-format-v3.txt , but if I understand its latest status correctly, it was just a proposal, with no actual plan or possibility to move forward, as it would break binary compatibility with existing Brotli compressed content? (so e.g. Chrome, Firefox and Safari would need to add support for this new framed format?) Is that accurate?

It seems based on #462 and #298 that the request to add a magic number or an official framing was dropped, with no plans to actually go forward with either?

If so, and adding magic numbers and/or official framing is not on the table anymore, I would like to propose an alternative feature that would at least work for some users' use cases, and retain the existing binary compatibility.

Unity is using a mechanism that enables embedding an uncompressed comment string into the Brotli binary. See the local code modification here: Unity-Technologies@5a6d5d9

That enables a brotli compressor to receive a string --comment "UnityWeb Compressed Content (brotli)" on the command line, which gets embedded to the stream. Unity is then using the following JavaScript code in a browser context to detect .br compressed files:

hasUnityMarker: function (data) {
  var expectedComment = "UnityWeb Compressed Content (brotli)";
  if (!data.length)
    return false;
  var WBITS_length = (data[0] & 0x01) ? (data[0] & 0x0E) ? 4 : 7 : 1,
      WBITS = data[0] & ((1 << WBITS_length) - 1),
      MSKIPBYTES = 1 + ((Math.log(expectedComment.length - 1) / Math.log(2)) >> 3);
      commentOffset = (WBITS_length + 1 + 2 + 1 + 2 + (MSKIPBYTES << 3) + 7) >> 3;
  if (WBITS == 0x11 || commentOffset > data.length)
    return false;
  var expectedCommentPrefix = WBITS + (((3 << 1) + (MSKIPBYTES << 4) + ((expectedComment.length - 1) << 6)) << WBITS_length);
  for (var i = 0; i < commentOffset; i++, expectedCommentPrefix >>>= 8) {
    if (data[i] != (expectedCommentPrefix & 0xFF))
      return false;
  }
  return String.fromCharCode.apply(null, data.subarray(commentOffset, commentOffset + expectedComment.length)) == expectedComment;
},

Would it be possible for this to become an officially supported feature in the command line executable brotli compressor? E.g. if one runs

brotli -9k --comment "this is brotli compressed" file.txt -o output.br

that would emit the given comment into the output file. Then users would be able to identify at least their own generated .br files - or if a compressor is used within a certain community (say among Emscripten developers), tooling could be used within that scope to develop detectable .br files?

That way people who oppose adding magic numbers or framing would not have their .br file size increase, but people would still be able to identify their generated .br files.

@polarathene
Copy link

AFAIK Dropbox has a port rust-brotli that added a magic number feature in Sep 2018.

It's meant to be compatible as a C drop-in replacement of this project AFAIK, but I'm not sure if it's been keeping in sync, so it might have diverged a bit. It has a few other additions too.

@eustas
Copy link
Collaborator

eustas commented Jan 4, 2023

Nice idea. Going to add comment feature soon.

@eustas eustas added the feature label Jan 6, 2023
@Artoria2e5
Copy link

Comment (or any variable-length part really) is also useful for HTB, the mitigation used to add randomness to packet sizes and mess with the BREACH exploit.

@juj
Copy link
Author

juj commented May 10, 2023

Hey @eustas , I wonder if that comment feature got added? Your post above suggests that it would have been something that was favorable to carry upstream?

@eustas
Copy link
Collaborator

eustas commented Jan 11, 2024

Done. See --comment= / -C CLI options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants