RUMM-1679 Compress HTTP body using `deflate` (ETF RFC 1950) #626

maxep · 2021-10-05T17:54:20Z

What and why?

Apply HTTP body compression using deflate data format as described in IETF RFC 1950

How?

Using the zlib structure (defined in IETF RFC 1950) with the DEFLATE compression algorithm (defined in IETF RFC 1951).

+  Header   +     Raw DEFLATE     +   Checksum    +
+-----+-----+=====================+---+---+---+---+
| CMF | FLG |...compressed data...|    ADLER32    |
+-----+-----+=====================+---+---+---+---+

The DEFLATE implementation uses compression_encode_buffer(_:_:_:_:_:_:) from the Compression framework by allocating a destination buffer of source size and copying the result into a Data structure. In the worst possible case, where the compression expands the data size, the destination buffer becomes too small and deflation returns nil.

The Adler32 checksum is performed by using the adler32(_:_:_) from zlib library.

After successful compression, RequestBuilder will set the request's httpBody with the deflated data and set the Content-Encoding header value to deflate. In case of compression failure, the httpBody will fallback to the uncompressed data.

Unit Tests

ServerMock is now provided with unzip and inflate methods to automatically decompress httpBody during tests.

Integration Tests

The http-server-mock python server now inflates request body when necessary.

Review checklist

Feature or bugfix MUST have appropriate tests (unit, integration)
Make sure each commit and the PR mention the Issue number or JIRA reference

mariusc83 · 2021-10-06T09:10:02Z

Sources/Datadog/Core/Upload/DataCompression.swift

+    // |CMF|FLG|
+    // +---+---+
+    // ref. https://datatracker.ietf.org/doc/html/rfc1950#section-2.2
+    let header = Data([0x78, 0x5e])


are these flags randomly generated ?

This is coming from mw99/DataCompression which is using the same algorithm defined in the Apple's Compression framework: COMPRESSION_ZLIB

mariusc83 · 2021-10-06T09:13:41Z

Sources/Datadog/Core/Upload/DataCompression.swift

+        return nil
+    }
+
+    var bytes = UInt32(checksum).bigEndian


Why is this needed ? is adler32 using littleEndian ?

ARM is most likely little-endian. In anyways, .bigEndian will only reverse bytes if necessary and this is needed to convert this UInt32 in network byte order.

mariusc83

LGTM but I am not the expert here :)

buranmert · 2021-10-06T16:33:28Z

Sources/Datadog/Core/Upload/DataCompression.swift

+
+        // The Adler-32 checksum should be initialized to 1 as described in
+        // https://datatracker.ietf.org/doc/html/rfc1950#section-8
+        return zlib.adler32(1, ptr, uInt(data.count))


this method seems to be macOS only
how does it work in iOS? where does it come from? 🤔

This documentation link is for libkernel which also includes an implementation of zlib.
Here, I'm using adler32 method from zlib library which is documented here. Both have the same signature, and probably the same implementation.

ncreated

Great job, very well explained and super clear 👌. I left few comments, with the test Data randomness being the most important one.

Out of curiosity, do we have any benchmarks on the compression stuff? What could be the estimated time impact on building the request?

Sources/Datadog/Core/Upload/DataCompression.swift

ncreated · 2021-10-07T08:48:18Z

Tests/DatadogTests/Datadog/Mocks/SystemFrameworks/FoundationMocks.swift

        return mockRepeating(byte: 0x41, times: Int(size))
    }

+    static func mockRandom<Size>(ofSize size: Size) -> Data where Size: BinaryInteger {
+        return mockRepeating(byte: .random(in: 0x00...0xFF), times: Int(size))


The "randomness" of this Data is very low as it just repeats a single, random byte given number of times. It doesn't seem enough to reliably test compression/decompression flow. How about having something more fancy here?

Definitely, good catch!

@ncreated, I've been adding a better randomisation using SecRandomCopyBytes BUT all compression started failing.. So I've been reading more about the subject and discovered that random data (especially cryptographically strong random numbers) cannot be compressed. In this wikipedia article you can read:

The primary encoding algorithms used to produce bit sequences are Huffman coding (also used by the deflate algorithm) and arithmetic coding. Arithmetic coding achieves compression rates close to the best possible for a particular statistical model, which is given by the information entropy, whereas Huffman compression is simpler and faster but produces poor results for models that deal with symbol probabilities close to 1.

In particular, files of random data cannot be consistently compressed by any conceivable lossless data compression algorithm; indeed, this result is used to define the concept of randomness in Kolmogorov complexity.[18]
It is probably impossible to create an algorithm that can losslessly compress any data

In conclusion, a good randomised array of bytes won't be compressible.

we can create such random strings by selecting randomElement()s from a pool of keywords
along these lines:

let keywords = ["foo", "bar", ...] let randomString = (1...100).map { _ in keywords.randomElement()! } // this should be much longer than keywords.count let compressed = zip(randomString) // do assertions

that's also closer to the actual payloads where the same keys are usually repeated multiple times.

Nice finding @maxep , I didn't know that at all 🙂. And I agree with @buranmert - we can significantly improve it by just picking a random byte N times, instead of picking a random byte and repeating it N times.

We have some existing basic character sets we can reuse if we want to focus on Strings.

picking a random byte N times, instead of picking a random byte and repeating it N times.

Yes, but that's still fails most of the time. Instead, I've added an object to encode with random properties. This perform a good randomisation with enough redundancy to allow compression.

Tests/DatadogTests/Datadog/Core/Upload/DataCompressionTests.swift

Sources/Datadog/Core/Upload/RequestBuilder.swift

maxep · 2021-10-08T10:12:55Z

@ncreated In regards of a benchmark, we have 2 metrics to consider: the compression rate and the compression speed.

In any case, deflate is faster than gzip because of its faster checksum calculation (using Adler32). Both format uses the zlib algorithm for compression.

The zlib algorithm tuned by Apple's Compression library is optimised for both rate and speed. If we want to tune that differently, we will have to use zlib interface directly.

I'm not able to find more info from Apple, the only resource I found is this wwdc2015 sessions where they introduce Compression fw. At around 4mn they compare algorithms with zlib as a reference point.

ncreated · 2021-10-11T07:52:39Z

@ncreated In regards of a benchmark, we have 2 metrics to consider: the compression rate and the compression speed.

In any case, deflate is faster than gzip because of its faster checksum calculation (using Adler32). Both format uses the zlib algorithm for compression.

The zlib algorithm tuned by Apple's Compression library is optimised for both rate and speed. If we want to tune that differently, we will have to use zlib interface directly.

I'm not able to find more info from Apple, the only resource I found is this wwdc2015 sessions where they introduce Compression fw. At around 4mn they compare algorithms with zlib as a reference point.

I was thinking about some basic stuff, like measuring time-to-create-request - just for sanity checking that it doesn't take alarmingly more time now 🙂. Here I ran TrackingConsentStartGrantedScenario in Example app to measure it when randomly taping buttons:

without compression:
⚡️ Creating request took: 0.001726984977722168s
⚡️ Creating request took: 0.0004640817642211914s
⚡️ Creating request took: 0.0006459951400756836s
⚡️ Creating request took: 0.00028192996978759766s
⚡️ Creating request took: 0.0004030466079711914s

with compression:
⚡️ Creating request took: 0.0031260251998901367s
⚡️ Creating request took: 0.0020869970321655273s
⚡️ Creating request took: 0.0014499425888061523s
⚡️ Creating request took: 0.0014690160751342773s
⚡️ Creating request took: 0.0013800859451293945s

I can see it's up to 5x slower - seems reasonable 👍.

Sources/Datadog/Core/Upload/RequestBuilder.swift

ncreated

🚀 💯

maxep requested a review from a team as a code owner October 5, 2021 17:54

maxep force-pushed the maxep/RUMM-1679/http-body-compression branch from 9e949eb to cbdb5c8 Compare October 5, 2021 19:01

maxep self-assigned this Oct 5, 2021

maxep marked this pull request as draft October 6, 2021 06:57

mariusc83 reviewed Oct 6, 2021

View reviewed changes

mariusc83 approved these changes Oct 6, 2021

View reviewed changes

maxep force-pushed the maxep/RUMM-1679/http-body-compression branch 2 times, most recently from c5d1d6d to d0c7231 Compare October 6, 2021 12:54

maxep added 2 commits October 6, 2021 14:59

RUMM-1679 Compress HTTP body using deflate format (ETF RFC 1950)

c1b0776

RUMM-1679 Decode deflated http body

90b101c

maxep force-pushed the maxep/RUMM-1679/http-body-compression branch from d0c7231 to 90b101c Compare October 6, 2021 13:00

maxep marked this pull request as ready for review October 6, 2021 13:35

buranmert reviewed Oct 6, 2021

View reviewed changes

ncreated reviewed Oct 7, 2021

View reviewed changes

Sources/Datadog/Core/Upload/RequestBuilder.swift Show resolved Hide resolved

RUMM-1679 Update tests

745f83b

maxep force-pushed the maxep/RUMM-1679/http-body-compression branch from a7ddc95 to 745f83b Compare October 8, 2021 09:25

RUMM-1679 Use Deflate namespace

2b244ee

RUMM-1679 Improve test randomisation

ca71181

ncreated requested changes Oct 11, 2021

View reviewed changes

Sources/Datadog/Core/Upload/RequestBuilder.swift Outdated Show resolved Hide resolved

Sources/Datadog/Core/Upload/RequestBuilder.swift Outdated Show resolved Hide resolved

RUMM-1679 Add internal monitoring

5cbb1e3

maxep force-pushed the maxep/RUMM-1679/http-body-compression branch from b907a4f to 5cbb1e3 Compare October 11, 2021 12:36

maxep requested a review from ncreated October 11, 2021 12:50

ncreated approved these changes Oct 12, 2021

View reviewed changes

buranmert approved these changes Oct 12, 2021

View reviewed changes

maxep merged commit 0b89539 into master Oct 12, 2021

maxep deleted the maxep/RUMM-1679/http-body-compression branch October 12, 2021 09:22

ncreated mentioned this pull request Oct 12, 2021

Dogfood recent changes #632

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUMM-1679 Compress HTTP body using `deflate` (ETF RFC 1950) #626

RUMM-1679 Compress HTTP body using `deflate` (ETF RFC 1950) #626

maxep commented Oct 5, 2021 •

edited

Loading

mariusc83 Oct 6, 2021

maxep Oct 6, 2021

mariusc83 Oct 6, 2021

maxep Oct 6, 2021 •

edited

Loading

mariusc83 left a comment

buranmert Oct 6, 2021

maxep Oct 7, 2021

ncreated left a comment

ncreated Oct 7, 2021

maxep Oct 7, 2021

maxep Oct 8, 2021

buranmert Oct 8, 2021

ncreated Oct 8, 2021

maxep Oct 11, 2021

maxep commented Oct 8, 2021 •

edited

Loading

ncreated commented Oct 11, 2021

ncreated left a comment

RUMM-1679 Compress HTTP body using deflate (ETF RFC 1950) #626

RUMM-1679 Compress HTTP body using deflate (ETF RFC 1950) #626

Conversation

maxep commented Oct 5, 2021 • edited Loading

What and why?

How?

Unit Tests

Integration Tests

Review checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxep Oct 6, 2021 • edited Loading

Choose a reason for hiding this comment

mariusc83 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncreated left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxep commented Oct 8, 2021 • edited Loading

ncreated commented Oct 11, 2021

ncreated left a comment

Choose a reason for hiding this comment

RUMM-1679 Compress HTTP body using `deflate` (ETF RFC 1950) #626

RUMM-1679 Compress HTTP body using `deflate` (ETF RFC 1950) #626

maxep commented Oct 5, 2021 •

edited

Loading

maxep Oct 6, 2021 •

edited

Loading

maxep commented Oct 8, 2021 •

edited

Loading