Report profiling data in v2.4 intake format; compress files #53

ivoanjo · 2022-09-22T15:12:07Z

What does this PR do?

This PR changes libdatadog to report profiling data using the v2.4 intake format. It also compresses the files ~~including~~ excluding the new event.json file using lz4.

This was tested with both the Ruby (agent and agentless modes) and the PHP profilers.

This also introduces two breaking changes:

Breaking API change: the
ddog_ProfileExporter_new / ProfileExporter::new functions now take two additional arguments -- the profiling library name and version.
Breaking behavior change: Files are now automatically compressed with lz4 by libdatadog so libraries mustn't do their own compression (Ruby is such an example).

We expect that other than the API change, using the v2.4 intake format and the added compression will be transparent to the libdatadog users.

Thanks to @morrisonlevi for pairing with me on this.

Note that this does not (yet) include support for including attributes in the reporting data. I'll leave that for a separate PR.

Motivation

Get libdatadog users to use intake v2.4, and lay the ground work for including attributes in the reported data.

Also save some bytes over the wire for files. A few stats:
- PHP CLI: running composer create-project symfony/symfony-demo resulted in a 40016 byte pprof which compressed to 18139 bytes in 141661 nanoseconds. That's a compression ratio of 2.2061 and a space savings of 54.67%.
- Ruby github relenv test app:

	Uncompressed	Gzip (Current Ruby profiler)	LZ4 (libdatadog)
code-provenance.json	49K	6K	9.5K
rubyprofile.pprof	402K	48K	76K

Additional Notes

(Nothing)

How to test the change?

Validate that data can still be reported to the backend, in both agent as well as agentless modes.

@morrisonlevi

This was tested with both the Ruby (agent and agentless modes) and the PHP profilers. This also introduces a breaking API change: the `ddog_ProfileExporter_build` / `ProfileExporter::build` functions now take two additional arguments -- the profiling library name and version. Other than that change, using the v2.4 intake format is transparent to the libdatadog users. Thanks to @morrisonlevi for pairing with me on this. Note that this does not (yet) include support for including attributes in the reporting data. I'll leave that for a separate PR.

This isn't a thing in v2.4

…ion]

ivoanjo · 2022-09-23T15:09:35Z

@morrisonlevi I was able to test your lz4 changes successfully with the Ruby profiler as well. I had, of course, to disable my own compression -- funnily enough the intake would still accept profiles that were gzipped by Ruby + lz4'd by libdatadog but they never showed up on the profile list ;)

r1viollet · 2022-09-23T15:56:28Z

profiling/src/exporter/mod.rs

-            )
+            let mut encoder = FrameEncoder::new(Vec::new());
+            encoder.write_all(file.bytes)?;
+            form.add_reader_file(file.name, Cursor::new(encoder.finish()?), file.name)


How is the compression format defined in the payload ?

AFAIK the intake just looks for magic bits to distinguish if an uploaded file is [gzip|lz4] compressed or not. Otherwise no changes needed.

Ivo is right, according to Florian:

Levi: How does it know what compression format to use? Inspect magic bytes?
Florian: yes
Florian: I was looking at java profiler and it uses LZ4 by default
Florian: Jaroslav experiments has shown that it's a good space/CPU tradeoff for the profiler

It's also is why I picked lz4. I didn't test or verify these claims beyond "works for me."

Thanks. There are indeed magic values at the start of the payload that you can use.
As a Datadog agnostic library, this is not super friendly.

As a Datadog agnostic library, this is not super friendly.

Do you mean, if other teams at Datadog want to reuse this, or if non-datadog people want to reuse this? I'd like to understand your concerns :)

morrisonlevi · 2022-09-24T14:23:51Z

I'm clarifying with Florian whether it's important that name and filename are like this:

Content-Disposition: form-data; name="time"; filename="time.pprof"

Or if this is fine:

Content-Disposition: form-data; name="time.pprof"; filename="time.pprof"

morrisonlevi

I'm still waiting on Florian for a question about name vs filename. Aside from this, with fresh I eyes I think we should set profiling library name and version at the same place we set family, which is when we build the exporter, not the request, as this information is unlikely to change.

I also don't know how many people are re-using an exporter vs just making a new one so... it may be a moot point.

morrisonlevi · 2022-09-22T16:29:25Z

profiling/src/exporter/mod.rs

@@ -120,26 +122,46 @@ impl ProfileExporter {
    }

    /// Build a Request object representing the profile information provided.
+    #[allow(clippy::too_many_arguments)]


IMO this is fine for now, but we should probably work on breaking this up when we add support for the extra attributes feature.

morrisonlevi · 2022-09-24T23:40:16Z

profiling/src/exporter/mod.rs

-            )
+            let mut encoder = FrameEncoder::new(Vec::new());
+            encoder.write_all(file.bytes)?;
+            form.add_reader_file(file.name, Cursor::new(encoder.finish()?), file.name)


Ivo is right, according to Florian:

Levi: How does it know what compression format to use? Inspect magic bytes?
Florian: yes
Florian: I was looking at java profiler and it uses LZ4 by default
Florian: Jaroslav experiments has shown that it's a good space/CPU tradeoff for the profiler

It's also is why I picked lz4. I didn't test or verify these claims beyond "works for me."

ivoanjo · 2022-09-27T07:35:05Z

I'm still waiting on Florian for a question about name vs filename. Aside from this, with fresh I eyes I think we should set profiling library name and version at the same place we set family, which is when we build the exporter, not the request, as this information is unlikely to change.

Right, that makes sense. I'll probably not have time today, but I've made and note and will come back soon-ish™️

morrisonlevi · 2022-09-27T10:01:46Z

I'm clarifying with Florian whether it's important that name and filename are like this:
Content-Disposition: form-data; name="time"; filename="time.pprof"
Or if this is fine:
Content-Disposition: form-data; name="time.pprof"; filename="time.pprof"

I confirmed that intake doesn't care about the form field name for files other than event.json, so we can reuse the filename as the form field name 👍🏻

morrisonlevi

This looks good to me, but then again I wrote or paired for all of it, so that's expected ^_^. I'll let someone else review it.

profiling/src/exporter/mod.rs

r1viollet · 2022-09-27T11:15:57Z

profiling/src/exporter/mod.rs

-    ) -> anyhow::Result<ProfileExporter> {
+    ) -> anyhow::Result<ProfileExporter>
+    where
+        F: Into<Cow<'static, str>>,


why do we need to alias to different types ?

Also the Cow 'static lifetime seems strange to me.

Instead of thinking Cow<str> as a string that gets cloned on write, think of it instead as either a Borrowed string, or an Owned one. In this case, we can borrow a 'static string because that will, by definition, live long enough. Doing so allows us to save some memory allocations if we can show the lifetime is static (such as PHP profiler calling this from Rust). However, if we can't prove it, then we need an Owned version that copies it; this is what will happen across the C FFI.
The reason for taking an Into<Cow<_>> is so that you can pass a &str or a String or anything else that the compiler knows how to convert via into or from, making it nicer for the caller since they don't have to wrap it into a Cow<_>. The reason for having it repeated 3 times is so each parameter can independently do this -- if we had only one type IntoCow: Into<Cow<'static, str>> that they all used, then they'd all have to be the same type, which isn't so nice. For instance, maybe the library name is a &str but the version is a String.

r1viollet

LGTM !
Happy to integrate this in native !

ivoanjo requested review from a team as code owners September 22, 2022 15:12

morrisonlevi added 3 commits September 22, 2022 18:25

Adjust profiler_tags encoding

87ad687

Remove data[] from the name of the files

148a2b6

This isn't a thing in v2.4

Add lz4 compression to files

abcb2a4

morrisonlevi changed the title ~~Report profiling data in v2.4 intake format~~ Report profiling data in v2.4 intake format; compress files Sep 22, 2022

Don't compress event.json

61ccef9

morrisonlevi force-pushed the ivoanjo/intake-v24-v2 branch from c2ac732 to 61ccef9 Compare September 22, 2022 18:40

Rename profile_library_[name|version] to profiling_library_[name|vers…

dc8b1dd

…ion]

r1viollet reviewed Sep 23, 2022

View reviewed changes

Test for DD-EVP-ORIGIN*

c031a06

morrisonlevi reviewed Sep 24, 2022

View reviewed changes

morrisonlevi added 2 commits September 27, 2022 03:08

Move profiling_library_{name,version} to constructor

1759e6f

Document some intake details

8225e83

morrisonlevi requested a review from r1viollet September 27, 2022 10:02

morrisonlevi reviewed Sep 27, 2022

View reviewed changes

ivoanjo commented Sep 27, 2022

View reviewed changes

profiling/src/exporter/mod.rs Outdated Show resolved Hide resolved

profiling/src/exporter/mod.rs Outdated Show resolved Hide resolved

Improve comments

a32ba62

r1viollet reviewed Sep 27, 2022

View reviewed changes

r1viollet approved these changes Sep 27, 2022

View reviewed changes

morrisonlevi merged commit 5b5a120 into main Sep 27, 2022

morrisonlevi deleted the ivoanjo/intake-v24-v2 branch September 27, 2022 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report profiling data in v2.4 intake format; compress files #53

Report profiling data in v2.4 intake format; compress files #53

ivoanjo commented Sep 22, 2022 •

edited by morrisonlevi

Loading

ivoanjo commented Sep 23, 2022

r1viollet Sep 23, 2022

ivoanjo Sep 23, 2022

morrisonlevi Sep 24, 2022

r1viollet Sep 27, 2022

ivoanjo Sep 27, 2022

morrisonlevi commented Sep 24, 2022

morrisonlevi left a comment

morrisonlevi Sep 22, 2022

morrisonlevi Sep 24, 2022

ivoanjo commented Sep 27, 2022

morrisonlevi commented Sep 27, 2022

morrisonlevi left a comment

r1viollet Sep 27, 2022

morrisonlevi Sep 27, 2022

r1viollet left a comment

Report profiling data in v2.4 intake format; compress files #53

Report profiling data in v2.4 intake format; compress files #53

Conversation

ivoanjo commented Sep 22, 2022 • edited by morrisonlevi Loading

What does this PR do?

Motivation

Additional Notes

How to test the change?

ivoanjo commented Sep 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morrisonlevi commented Sep 24, 2022

morrisonlevi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoanjo commented Sep 27, 2022

morrisonlevi commented Sep 27, 2022

morrisonlevi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r1viollet left a comment

Choose a reason for hiding this comment

ivoanjo commented Sep 22, 2022 •

edited by morrisonlevi

Loading