RUM-1843 Method Call telemetry #1723

maciejburda · 2024-03-13T11:19:17Z

What and why?

This PR adds telemetry for Method Called (internal).

This telemetry allows measuring execution time of methods + adds call side sampling for heavily called functions.

Along with these interfaces, we implement first function to utilise this: SessionReplay.Recorder.captureNextRecord().

How?

It's mostly reusing existing strategy for Batch metrics adding a little bit of convenience methods for measuring execution time.

Review checklist

Feature or bugfix MUST have appropriate tests (unit, integration)
Make sure each commit and the PR mention the Issue number or JIRA reference
Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

Run unit tests for Core, RUM, Trace, Logs, CR and WVT
Run unit tests for Session Replay
Run integration tests
Run smoke tests
Run tests for tools/

datadog-datadog-prod-us1 · 2024-03-13T15:24:54Z

Datadog Report

Branch report: maciey/RUM-1843-capture-next-record-telemetry
Commit report: 1d9837f
Test service: dd-sdk-ios

✅ 0 Failed, 2821 Passed, 0 Skipped, 13m 7.38s Wall Time
🔻 Test Sessions change in coverage: 7 decreased, 7 increased

🔻 Code Coverage Decreases vs Default Branch (7)

This report shows up to 5 code coverage decreases.

test DatadogTraceTests tvOS 49.74% (-0.22%) - Details
test DatadogTraceTests iOS 49.71% (-0.17%) - Details
test DatadogLogsTests tvOS 45.70% (-0.16%) - Details
test DatadogLogsTests iOS 45.64% (-0.16%) - Details
test DatadogCrashReportingTests iOS 28.91% (-0.12%) - Details

ncreated

🚀 Great to see it! Well done 👍, I left few minor remarks (non-blocking). Let's have more opinions on sampling value 🙏 .

DatadogInternal/Sources/Telemetry/Telemetry.swift

DatadogRUM/Sources/Integrations/TelemetryReceiver.swift

ncreated · 2024-03-14T11:59:59Z

DatadogSessionReplay/Sources/Recorder/Recorder.swift

+        let methodCalledTrace = telemetry.startMethodCalled(
+            operationName: MethodCallConstants.captureRecordOperationName,
+            callerClass: MethodCallConstants.className,
+            samplingRate: methodCallTelemetrySamplingRate // Effectively 3% * 15% = 0.45% of calls


question/ The sampling here is crucial, so it may be worth having more opinions.

Given the SR interval of 100ms, this equals (60s / 100ms) * 0.45% so 2.7 metrics per minute. Having the "only 100 telemetry events per session" limit, it means that SR will exhaust it after 37 minutes (100 / 2.7) assuming no other telemetries were sent. It sounds rationale to me, but I wouldn't mind having even lower sampling, let's say 2x lower (15 → 7.5, so 0.225% effectively). Thoughts @maciejburda @plousada @maxep @ganeshnj ?

I can definitely be lower. Even 5% seems enough (effectively 0.15%).

As discussed - maybe all telemetries should be sampled on the call side instead. This gives more control for future metrics, and removes the need to remember the math of 20% * 15% * Y% = Desired%.

Right now whatever % we choose for the call side sampling will have to unnecessarily travel through the message bus to be rejected by top level sampling.

Right now whatever % we choose for the call side sampling will have to unnecessarily travel through the message bus to be rejected by top level sampling.

Let's create backlog item 🙏, we can accept it for now.

https://datadoghq.atlassian.net/browse/RUM-3572

my bigger concern about such telemetries are about turning them off.

Once this code is out in the app, we will continue getting this telemetry even though we no longer need them because the concern has been addressed or this is way too much of data or etc.

This brings the topic of remote configs but we are not there yet.

DatadogSessionReplay/Sources/Recorder/Recorder.swift

ganeshnj · 2024-03-15T13:10:20Z

DatadogInternal/Sources/Telemetry/Telemetry.swift

+    ///   - samplingRate: The sampling rate of the method call. Value between `0.0` and `100.0`, where `0.0` means NO event will be processed and `100.0` means ALL events will be processed. Note that this value is multiplicated by telemetry sampling (by default 20%) and metric events sampling (hardcoded to 15%). Making it effectively 3% sampling rate for sending events, when this value is set to `100`.
+    ///
+    /// - Returns: A `MethodCalledTrace` instance to be used to stop the method call and measure it's execution time. It can be `nil` if the method call is not sampled.
+    func startMethodCalled(


identifying method with just name and class seems naive but probably okay for current requirements.

For example RequestBuilder protocol we have which has same impl name everywhere.

There is concept of fully qualified name which gives the unique name of the method in the current context.

Thanks! Some good points there

operation_name is agnostic of the platform because we can measure the same "behaviour" in both platforms, but they might have totally different function signature. For the class name we currently do this:

static let className = { String(reflecting: Recorder.self) }()

Which produces: "DatadogSessionReplay.Recorder"

Capture Record is indeed first and simplistic example Method Called, but if we need we can expand this model for the future use cases.

maciejburda added 3 commits March 12, 2024 15:13

RUM-1843 Add telemetry metric sketch

59ee4b4

RUM-1843 Add Method Call metric for record capture

8903af3

RUM-1843 Minor tweaks

3d9523a

maciejburda force-pushed the maciey/RUM-1843-capture-next-record-telemetry branch from 59427c9 to 3d9523a Compare March 13, 2024 11:22

maciejburda added 2 commits March 13, 2024 14:22

RUM-1843 Add tests

a512db8

RUM-1843 Add tests

d154d36

maciejburda requested a review from ncreated March 13, 2024 15:26

maciejburda marked this pull request as ready for review March 13, 2024 16:14

maciejburda requested review from a team as code owners March 13, 2024 16:14

ncreated previously approved these changes Mar 14, 2024

View reviewed changes

RUM-1843 PR fixes

1d9837f

maciejburda dismissed ncreated’s stale review via 1d9837f March 14, 2024 13:07

ncreated approved these changes Mar 14, 2024

View reviewed changes

ganeshnj approved these changes Mar 15, 2024

View reviewed changes

maciejburda merged commit 6acd10b into develop Mar 15, 2024
8 checks passed

maciejburda deleted the maciey/RUM-1843-capture-next-record-telemetry branch March 15, 2024 14:15

maciejburda mentioned this pull request Mar 15, 2024

Dogfood recent changes #1727

Merged

8 tasks

maciejburda mentioned this pull request Mar 27, 2024

RUM-1844: Add Method Call Telemetry DataDog/dd-sdk-android#1940

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUM-1843 Method Call telemetry #1723

RUM-1843 Method Call telemetry #1723

maciejburda commented Mar 13, 2024 •

edited

Loading

datadog-datadog-prod-us1 bot commented Mar 13, 2024 •

edited

Loading

ncreated left a comment

ncreated Mar 14, 2024

maciejburda Mar 14, 2024 •

edited

Loading

ncreated Mar 14, 2024

maciejburda Mar 15, 2024

ganeshnj Mar 15, 2024

ganeshnj Mar 15, 2024

maciejburda Mar 15, 2024 •

edited

Loading

RUM-1843 Method Call telemetry #1723

RUM-1843 Method Call telemetry #1723

Conversation

maciejburda commented Mar 13, 2024 • edited Loading

What and why?

How?

Review checklist

Custom CI job configuration (optional)

datadog-datadog-prod-us1 bot commented Mar 13, 2024 • edited Loading

Datadog Report

🔻 Code Coverage Decreases vs Default Branch (7)

ncreated left a comment

Choose a reason for hiding this comment

ncreated Mar 14, 2024

Choose a reason for hiding this comment

maciejburda Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

ncreated Mar 14, 2024

Choose a reason for hiding this comment

maciejburda Mar 15, 2024

Choose a reason for hiding this comment

ganeshnj Mar 15, 2024

Choose a reason for hiding this comment

ganeshnj Mar 15, 2024

Choose a reason for hiding this comment

maciejburda Mar 15, 2024 • edited Loading

Choose a reason for hiding this comment

maciejburda commented Mar 13, 2024 •

edited

Loading

datadog-datadog-prod-us1 bot commented Mar 13, 2024 •

edited

Loading

maciejburda Mar 14, 2024 •

edited

Loading

maciejburda Mar 15, 2024 •

edited

Loading