RUM-1660 chore: Send "RUM Session Ended" telemetry #1866

ncreated · 2024-05-29T09:53:39Z

What and why?

📦 🔭 Following the spec (internal), this PR adds "RUM Session Ended" telemetry.

The "RUM Session Ended" telemetry marks the conclusion of a RUM session and provides meta parameters that describe the session's quality and diagnostic details.

How?

The goal of this implementation is to better understand the possible states of a RUM session by making minimal assumptions about how sessions are tracked in RUM scopes. For instance, it can track multiple sessions simultaneously, which, though unlikely, is observed in some telemetry data. The design aims to be loosely coupled with the RUM architecture, allowing flexible and convenient access to the metric object rather than restricting its lifecycle to RUM scopes.

The high-level design of this telemetry consists of two main components:

SessionEndedMetricController: Manages the lifecycle of metric objects.
SessionEndedMetric: Tracks the state of a RUM session and exports metric attributes.

Both are thoroughly covered in unit tests. Integration tests ensure the proper integration of the metric with RUM.

The interface of SessionEndedMetricController:

internal final class SessionEndedMetricController {
    // Starts a new metric for a given session.
    func startMetric(sessionID: RUMUUID, precondition: RUMSessionPrecondition?, context: DatadogContext)

    // 3 APIs tracking session state for given session ID (or `nil` for latest session):
    func track(view: RUMViewEvent, in sessionID: RUMUUID?)
    func track(sdkErrorKind: String, in sessionID: RUMUUID?)
    func trackWasStopped(sessionID: RUMUUID?)

    // Ends the metric for a given session, sending it to telemetry and removing it from pending metrics.
    func endMetric(sessionID: String)
}

The interface of SessionEndedMetric:

internal final struct SessionEndedMetric {
    mutating func track(view: RUMViewEvent)
    mutating func track(sdkErrorKind: String)
    mutating func trackWasStopped()
    func asMetricAttributes() -> [String: Encodable]
}

Review checklist

Feature or bugfix MUST have appropriate tests (unit, integration)
Make sure each commit and the PR mention the Issue number or JIRA reference
Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

Run unit tests for Core, RUM, Trace, Logs, CR and WVT
Run unit tests for Session Replay
Run integration tests
Run smoke tests
Run tests for tools/

datadog-datadog-prod-us1 · 2024-05-31T13:52:06Z

Datadog Report

Branch report: ncreated/RUM-1660/send-rum-session-ended-telemetry
Commit report: 7fae496
Test service: dd-sdk-ios

✅ 0 Failed, 2998 Passed, 0 Skipped, 2m 1.36s Total Time
🔻 Test Sessions change in coverage: 7 decreased, 5 increased

🔻 Code Coverage Decreases vs Default Branch (7)

This report shows up to 5 code coverage decreases.

test DatadogCoreTests tvOS 22.32% (-57.16%) - Details
test DatadogTraceTests tvOS 54.88% (-0.24%) - Details
test DatadogLogsTests tvOS 46.02% (-0.08%) - Details
test DatadogCrashReportingTests iOS 27.19% (-0.08%) - Details
test DatadogLogsTests iOS 45.97% (-0.06%) - Details

Datadog/IntegrationUnitTests/RUM/SDKMetrics/RUMSessionEndedMetricIntegrationTests.swift

maciejburda · 2024-06-04T11:47:16Z

DatadogInternal/Sources/Context/BundleType.swift

    case iOSAppExtension
+
+    public init(bundle: Bundle) {


maciejburda · 2024-06-04T13:01:10Z

DatadogRUM/Sources/SDKMetrics/SessionEndedMetric.swift

+    /// Tracks the view event that occurred during the session.
+    func track(view: RUMViewEvent) {
+        guard view.session.id == sessionID else {
+            return // sanity check, unexpected


maybe worth telemetry?

Indeed 👍. I added an error so we will see if views are tracked in wrong sessions (although unexpected, it is possible from upstream code).

maxep

Looks great, thanks for the test coverage!
I have left a comment re. usage of locks, LMKWYT.

maxep · 2024-06-05T06:52:47Z

DatadogRUM/Sources/Integrations/TelemetryInterecptor.swift

+import DatadogInternal
+
+/// Intercepts telemetry events sent through message bus.
+internal struct TelemetryInterecptor: FeatureMessageReceiver {


Suggested change

internal struct TelemetryInterecptor: FeatureMessageReceiver {

internal struct TelemetryInterceptor: FeatureMessageReceiver {

Given the simplicity of this receiver, it would make sense to define the sessionEndedMetric in the TelemetryReceiver where we already handle all telemetry.

I'd say it always starts simple at the beginning. The TelemetryReceiver already covers multiple concerns, I didn't want to introduce yet another one. For that reason I've chosen composition and I'd prefer to stay with this unless this is a change request.

While it can be easily moved to TelemetryReceiver, adding unit tests will be way harder as it operates on much higher abstractions than the TelemetryInterceptor we introduce here. LTMKWDYT @maxep

Yes, I don't have strong opinion on this.
But the typos still persist ;)

But the typos still persist ;)

Ops! Fixed ✅

maxep · 2024-06-05T11:06:52Z

DatadogRUM/Sources/SDKMetrics/SessionEndedMetricController.swift

+    /// Dictionary to keep track of pending metrics, keyed by session ID.
+    @ReadWriteLock
+    private var metricsBySessionID: [String: SessionEndedMetric] = [:]
+    /// Array to keep track of pending metrics in their start order.
+    @ReadWriteLock
+    private var metrics: [SessionEndedMetric] = []


/discussion Mutating a metric property will acquire at least 2 locks, 1 in the controller and one in the metric. The perf penalty will be minimal because we don't expect many concurrent calls. But I want to highlight that the @ReadWriteLock wrapper uses a fair pthread lock which is not cheap.

Moreover, the metric.track(*_:) methods are acquiring the property lock multiple times, which could be prevented by calling the mutate closure.

I think it could be redesigned to use a single lock in the controller and perform all mutations at the controller level, something like the following where SessionEndedMetric could be a struct:

internal final class SessionEndedMetricController { // Starts a new metric for a given session. func startMetric(sessionID: String, precondition: RUMSessionPrecondition?, context: DatadogContext) // Tracks the view event that occurred during the session. func track(view: RUMViewEvent, for sessionID: String) // Tracks the kind of SDK error that occurred during the session. func track(sdkErrorKind: String, for sessionID: String) // Signals that the session was stopped with `stopSession()` API. func trackWasStopped(for sessionID: String) // Ends the metric for a given session, sending it to telemetry and removing it from pending metrics. func endMetric(sessionID: String) // Retrieves the last started metric. var latestMetric: SessionEndedMetric? { get } }

What do you think?

Note: the @ReadWriteLock is convenient but shouldn't be our only lock. We could provide a lock object as well (similar to NSLock) so we can acquire a lock for operations and not only for property access. We could also provide unfair lock such as os_unfair_lock which are cheap.

👍 I had this concern on double locking, but ignored it with assuming minimal impact and trying to keep the solution simple. With this feedback I'm happy to improve this 👌.

✅ Addressed 👍. I made the SessionEndedMetric a struct with mutable methods. The controller is now a class which mutates the right SEM using single RW lock 👍.

ambushwork · 2024-06-10T11:06:09Z

DatadogRUM/Sources/SDKMetrics/SessionEndedMetricController.swift

+    }
+
+    private func updateMetric(for sessionID: RUMUUID?, _ mutation: (inout SessionEndedMetric?) throws -> Void) {
+        guard let sessionID = (sessionID ?? pendingSessionIDs.last) else {


I have a question about pendingSessionIDs, it seems that these array is only queried for the last element, so why not just keep the last session id? what's the purpose to create the array?

@ambushwork The pendingSessionIDs is more a history of pending sessions. It is true that we query only for the last element, but it might change from one query to another if endMetric(sessionID:) is called in between (we remove the ID from array as part of that call). If using single property to store the "last ID", we won't know how to change it after endMetric(sessionID:) as we wouldn't know the previous ID.

Also, even though sessions are expected to end in the same order as they were started, some iOS telemetry shows rare but existing consistency problems in session management. For that reason, we prefer to not introduce too many assumptions in "Session Ended" metric, which is another reason why we track the whole array of IDs rather than only the "latest" one.

Thanks, quite clear for me now !

ambushwork · 2024-06-10T14:22:02Z

DatadogRUM/Sources/RUMMonitor/Scopes/RUMApplicationScope.swift

+            // proccss(command:context:writer) returned false, so the scope will be deallocated at the end of
+            // this execution context. End the "RUM Session Ended" metric:
+            defer { dependencies.sessionEndedMetric.endMetric(sessionID: scope.sessionUUID) }
+


If I understand correctly, the endMetric can be triggered only when there is a command to process and it returns false, otherwise it can not be triggered it self even if the duration and timeout has passed the limit?

That is correct 👍. Every state change in RUM is caused by command. Even in case of "15min timeout" or "4h max duration", the RUMApplicationScope only recognizes these upon receiving next command.

maxep

Looks great! I left a minor suggestion on removing the extra lock on the pendingSessionIDs property.

maxep · 2024-06-11T08:06:51Z

DatadogRUM/Sources/Integrations/TelemetryInterecptor.swift

+import DatadogInternal
+
+/// Intercepts telemetry events sent through message bus.
+internal struct TelemetryInterecptor: FeatureMessageReceiver {


Yes, I don't have strong opinion on this.
But the typos still persist ;)

maxep · 2024-06-11T08:15:07Z

DatadogRUM/Sources/SDKMetrics/SessionEndedMetricController.swift

+    @ReadWriteLock
+    private var pendingSessionIDs: [RUMUUID] = []


/suggestion As this is private, and tightly linked to metricsBySessionID, it can be mutated while acquiring the lock of metricsBySessionID. Starting a metric would look like:

_metricsBySessionID.mutate { metrics in metrics[sessionID] = SessionEndedMetric(sessionID: sessionID, precondition: precondition, context: context) pendingSessionIDs.append(sessionID) }

As this is private and tightly linked (...)

Fair call, done ✅

maxep

Well done, it looks great!!

maciejburda · 2024-06-12T08:41:06Z

Datadog/Datadog.xcodeproj/project.pbxproj

@@ -8987,7 +8987,7 @@
 				D29A9F8929DD85BB005C54A4 /* VitalRefreshRateReader.swift in Sources */,
 				D29A9F6929DD85BB005C54A4 /* UIKitRUMUserActionsHandler.swift in Sources */,
 				D29A9F5229DD85BB005C54A4 /* RUMUUIDGenerator.swift in Sources */,
-				61DCC84E2C071DCD00CB59E5 /* TelemetryInterecptor.swift in Sources */,
+				61DCC84E2C071DCD00CB59E5 /* TelemetryInterceptor.swift in Sources */,


so it can be available for each product.

so it can be easily inejcted to different parts of RUM.

…n state

…sion

… recent IDs

ncreated self-assigned this May 29, 2024

ncreated force-pushed the ncreated/RUM-1660/send-rum-session-ended-telemetry branch from bce9791 to 3244358 Compare May 31, 2024 10:48

ncreated marked this pull request as ready for review May 31, 2024 14:29

ncreated requested review from a team as code owners May 31, 2024 14:29

maciejburda requested a review from ambushwork June 4, 2024 11:38

maciejburda reviewed Jun 4, 2024

View reviewed changes

maxep reviewed Jun 5, 2024

View reviewed changes

ncreated requested review from maxep and maciejburda June 10, 2024 09:38

ambushwork reviewed Jun 10, 2024

View reviewed changes

maxep previously approved these changes Jun 11, 2024

View reviewed changes

ncreated dismissed maxep’s stale review via fa4f3f4 June 11, 2024 12:25

ncreated requested a review from maxep June 11, 2024 12:26

maxep approved these changes Jun 11, 2024

View reviewed changes

ncreated mentioned this pull request Jun 11, 2024

RUM-1660 chore: Enhance RUM session debugging in Example app #1894

Closed

8 tasks

maciejburda approved these changes Jun 12, 2024

View reviewed changes

ncreated added 11 commits June 12, 2024 10:52

RUM-1660 Add BundleType to DatadogContext

f11d2a1

so it can be available for each product.

RUM-1660 Define SessionEndedMetric and DI controller

83285c0

so it can be easily inejcted to different parts of RUM.

RUM-1660 Inject Session Ended Metric into RUM

792e094

RUM-1660 Add tests for Session Ended Metric Controller

e90f149

RUM-1660 Track "RUM Session Ended" attributes in RUM

c3918d3

RUM-1660 Add tests for "RUM Session Ended" metric spec

37a8536

RUM-1660 Fix lint

180ab41

RUM-1660 Add more tests

5954c06

RUM-1660 Fix lint

1e65da5

RUM-1660 CR feedback - reduce number of RW locks used to track sessio…

35c8096

…n state

RUM-1660 CR feedback - simplify tests setup

85eb323

ncreated added 3 commits June 12, 2024 10:52

RUM-1660 CR feedback - send telemetry on tracking view in foreign ses…

297b1ac

…sion

RUM-1660 CR feedback - fix typo

68aa7b7

RUM-1660 CR feedback - use single lock for tracking session state and…

7fae496

… recent IDs

ncreated force-pushed the ncreated/RUM-1660/send-rum-session-ended-telemetry branch from fa4f3f4 to 7fae496 Compare June 12, 2024 08:52

ncreated merged commit 15f7989 into develop Jun 12, 2024
8 checks passed

ncreated deleted the ncreated/RUM-1660/send-rum-session-ended-telemetry branch June 12, 2024 09:34

maciejburda mentioned this pull request Jun 12, 2024

Release 2.13.0 #1899

Merged

8 tasks

This was referenced Jun 12, 2024

RUM-1660 chore: Enhance RUM session debugging in Example app #1902

Merged

RUM-4591 chore: Add diagnostic attributes to "RUM Session Ended" telemetry #1904

Merged

maciejburda mentioned this pull request Jun 13, 2024

Dogfood recent changes #1905

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUM-1660 chore: Send "RUM Session Ended" telemetry #1866

RUM-1660 chore: Send "RUM Session Ended" telemetry #1866

ncreated commented May 29, 2024 •

edited

Loading

datadog-datadog-prod-us1 bot commented May 31, 2024 •

edited

Loading

maciejburda Jun 4, 2024

maciejburda Jun 4, 2024

ncreated Jun 10, 2024

maxep left a comment

maxep Jun 5, 2024

ncreated Jun 5, 2024 •

edited

Loading

maxep Jun 11, 2024

ncreated Jun 11, 2024

maxep Jun 5, 2024

ncreated Jun 5, 2024

ncreated Jun 10, 2024

ambushwork Jun 10, 2024

ncreated Jun 10, 2024

ambushwork Jun 10, 2024

ambushwork Jun 10, 2024

ncreated Jun 10, 2024

maxep left a comment

maxep Jun 11, 2024

maxep Jun 11, 2024

ncreated Jun 11, 2024

maxep left a comment

maciejburda Jun 12, 2024

	internal struct TelemetryInterecptor: FeatureMessageReceiver {
	internal struct TelemetryInterceptor: FeatureMessageReceiver {

RUM-1660 chore: Send "RUM Session Ended" telemetry #1866

RUM-1660 chore: Send "RUM Session Ended" telemetry #1866

Conversation

ncreated commented May 29, 2024 • edited Loading

What and why?

How?

Review checklist

Custom CI job configuration (optional)

datadog-datadog-prod-us1 bot commented May 31, 2024 • edited Loading

Datadog Report

🔻 Code Coverage Decreases vs Default Branch (7)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxep left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncreated Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxep left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxep left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ncreated commented May 29, 2024 •

edited

Loading

datadog-datadog-prod-us1 bot commented May 31, 2024 •

edited

Loading

ncreated Jun 5, 2024 •

edited

Loading