Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUM-1660 chore: Send "RUM Session Ended" telemetry #1866

Merged
merged 14 commits into from
Jun 12, 2024

Conversation

ncreated
Copy link
Member

@ncreated ncreated commented May 29, 2024

What and why?

📦 🔭 Following the spec (internal), this PR adds "RUM Session Ended" telemetry.

The "RUM Session Ended" telemetry marks the conclusion of a RUM session and provides meta parameters that describe the session's quality and diagnostic details.

How?

The goal of this implementation is to better understand the possible states of a RUM session by making minimal assumptions about how sessions are tracked in RUM scopes. For instance, it can track multiple sessions simultaneously, which, though unlikely, is observed in some telemetry data. The design aims to be loosely coupled with the RUM architecture, allowing flexible and convenient access to the metric object rather than restricting its lifecycle to RUM scopes.

The high-level design of this telemetry consists of two main components:

  • SessionEndedMetricController: Manages the lifecycle of metric objects.
  • SessionEndedMetric: Tracks the state of a RUM session and exports metric attributes.

Both are thoroughly covered in unit tests. Integration tests ensure the proper integration of the metric with RUM.

The interface of SessionEndedMetricController:

internal final class SessionEndedMetricController {
    // Starts a new metric for a given session.
    func startMetric(sessionID: RUMUUID, precondition: RUMSessionPrecondition?, context: DatadogContext)

    // 3 APIs tracking session state for given session ID (or `nil` for latest session):
    func track(view: RUMViewEvent, in sessionID: RUMUUID?)
    func track(sdkErrorKind: String, in sessionID: RUMUUID?)
    func trackWasStopped(sessionID: RUMUUID?)

    // Ends the metric for a given session, sending it to telemetry and removing it from pending metrics.
    func endMetric(sessionID: String)
}

The interface of SessionEndedMetric:

internal final struct SessionEndedMetric {
    mutating func track(view: RUMViewEvent)
    mutating func track(sdkErrorKind: String)
    mutating func trackWasStopped()
    func asMetricAttributes() -> [String: Encodable]
}

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

  • Run unit tests for Core, RUM, Trace, Logs, CR and WVT
  • Run unit tests for Session Replay
  • Run integration tests
  • Run smoke tests
  • Run tests for tools/

@ncreated ncreated self-assigned this May 29, 2024
@ncreated ncreated force-pushed the ncreated/RUM-1660/send-rum-session-ended-telemetry branch from bce9791 to 3244358 Compare May 31, 2024 10:48
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented May 31, 2024

Datadog Report

Branch report: ncreated/RUM-1660/send-rum-session-ended-telemetry
Commit report: 7fae496
Test service: dd-sdk-ios

✅ 0 Failed, 2998 Passed, 0 Skipped, 2m 1.36s Total Time
🔻 Test Sessions change in coverage: 7 decreased, 5 increased

🔻 Code Coverage Decreases vs Default Branch (7)

This report shows up to 5 code coverage decreases.

  • test DatadogCoreTests tvOS 22.32% (-57.16%) - Details
  • test DatadogTraceTests tvOS 54.88% (-0.24%) - Details
  • test DatadogLogsTests tvOS 46.02% (-0.08%) - Details
  • test DatadogCrashReportingTests iOS 27.19% (-0.08%) - Details
  • test DatadogLogsTests iOS 45.97% (-0.06%) - Details

@ncreated ncreated marked this pull request as ready for review May 31, 2024 14:29
@ncreated ncreated requested review from a team as code owners May 31, 2024 14:29
@maciejburda maciejburda requested a review from ambushwork June 4, 2024 11:38
case iOSAppExtension

public init(bundle: Bundle) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

/// Tracks the view event that occurred during the session.
func track(view: RUMViewEvent) {
guard view.session.id == sessionID else {
return // sanity check, unexpected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth telemetry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed 👍. I added an error so we will see if views are tracked in wrong sessions (although unexpected, it is possible from upstream code).

Copy link
Member

@maxep maxep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the test coverage!
I have left a comment re. usage of locks, LMKWYT.

import DatadogInternal

/// Intercepts telemetry events sent through message bus.
internal struct TelemetryInterecptor: FeatureMessageReceiver {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal struct TelemetryInterecptor: FeatureMessageReceiver {
internal struct TelemetryInterceptor: FeatureMessageReceiver {

Given the simplicity of this receiver, it would make sense to define the sessionEndedMetric in the TelemetryReceiver where we already handle all telemetry.

Copy link
Member Author

@ncreated ncreated Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it always starts simple at the beginning. The TelemetryReceiver already covers multiple concerns, I didn't want to introduce yet another one. For that reason I've chosen composition and I'd prefer to stay with this unless this is a change request.

While it can be easily moved to TelemetryReceiver, adding unit tests will be way harder as it operates on much higher abstractions than the TelemetryInterceptor we introduce here. LTMKWDYT @maxep

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't have strong opinion on this.
But the typos still persist ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the typos still persist ;)

Ops! Fixed ✅

Comment on lines 12 to 17
/// Dictionary to keep track of pending metrics, keyed by session ID.
@ReadWriteLock
private var metricsBySessionID: [String: SessionEndedMetric] = [:]
/// Array to keep track of pending metrics in their start order.
@ReadWriteLock
private var metrics: [SessionEndedMetric] = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/discussion Mutating a metric property will acquire at least 2 locks, 1 in the controller and one in the metric. The perf penalty will be minimal because we don't expect many concurrent calls. But I want to highlight that the @ReadWriteLock wrapper uses a fair pthread lock which is not cheap.

Moreover, the metric.track(*_:) methods are acquiring the property lock multiple times, which could be prevented by calling the mutate closure.

I think it could be redesigned to use a single lock in the controller and perform all mutations at the controller level, something like the following where SessionEndedMetric could be a struct:

internal final class SessionEndedMetricController {
    // Starts a new metric for a given session.
    func startMetric(sessionID: String, precondition: RUMSessionPrecondition?, context: DatadogContext)

    // Tracks the view event that occurred during the session.
    func track(view: RUMViewEvent, for sessionID: String)

    // Tracks the kind of SDK error that occurred during the session.
    func track(sdkErrorKind: String, for sessionID: String)

    // Signals that the session was stopped with `stopSession()` API.
    func trackWasStopped(for sessionID: String)

    // Ends the metric for a given session, sending it to telemetry and removing it from pending metrics.
    func endMetric(sessionID: String)

    // Retrieves the last started metric.
    var latestMetric: SessionEndedMetric? { get }
}

What do you think?

Note: the @ReadWriteLock is convenient but shouldn't be our only lock. We could provide a lock object as well (similar to NSLock) so we can acquire a lock for operations and not only for property access. We could also provide unfair lock such as os_unfair_lock which are cheap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I had this concern on double locking, but ignored it with assuming minimal impact and trying to keep the solution simple. With this feedback I'm happy to improve this 👌.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Addressed 👍. I made the SessionEndedMetric a struct with mutable methods. The controller is now a class which mutates the right SEM using single RW lock 👍.

@ncreated ncreated requested review from maxep and maciejburda June 10, 2024 09:38
}

private func updateMetric(for sessionID: RUMUUID?, _ mutation: (inout SessionEndedMetric?) throws -> Void) {
guard let sessionID = (sessionID ?? pendingSessionIDs.last) else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question about pendingSessionIDs, it seems that these array is only queried for the last element, so why not just keep the last session id? what's the purpose to create the array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ambushwork The pendingSessionIDs is more a history of pending sessions. It is true that we query only for the last element, but it might change from one query to another if endMetric(sessionID:) is called in between (we remove the ID from array as part of that call). If using single property to store the "last ID", we won't know how to change it after endMetric(sessionID:) as we wouldn't know the previous ID.

Also, even though sessions are expected to end in the same order as they were started, some iOS telemetry shows rare but existing consistency problems in session management. For that reason, we prefer to not introduce too many assumptions in "Session Ended" metric, which is another reason why we track the whole array of IDs rather than only the "latest" one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, quite clear for me now !

// proccss(command:context:writer) returned false, so the scope will be deallocated at the end of
// this execution context. End the "RUM Session Ended" metric:
defer { dependencies.sessionEndedMetric.endMetric(sessionID: scope.sessionUUID) }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the endMetric can be triggered only when there is a command to process and it returns false, otherwise it can not be triggered it self even if the duration and timeout has passed the limit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct 👍. Every state change in RUM is caused by command. Even in case of "15min timeout" or "4h max duration", the RUMApplicationScope only recognizes these upon receiving next command.

maxep
maxep previously approved these changes Jun 11, 2024
Copy link
Member

@maxep maxep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I left a minor suggestion on removing the extra lock on the pendingSessionIDs property.

import DatadogInternal

/// Intercepts telemetry events sent through message bus.
internal struct TelemetryInterecptor: FeatureMessageReceiver {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't have strong opinion on this.
But the typos still persist ;)

@ReadWriteLock
private var pendingSessionIDs: [RUMUUID] = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/suggestion As this is private, and tightly linked to metricsBySessionID, it can be mutated while acquiring the lock of metricsBySessionID. Starting a metric would look like:

_metricsBySessionID.mutate { metrics in
    metrics[sessionID] = SessionEndedMetric(sessionID: sessionID, precondition: precondition, context: context)
    pendingSessionIDs.append(sessionID)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is private and tightly linked (...)

Fair call, done ✅

Copy link
Member

@maxep maxep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, it looks great!!

@@ -8987,7 +8987,7 @@
D29A9F8929DD85BB005C54A4 /* VitalRefreshRateReader.swift in Sources */,
D29A9F6929DD85BB005C54A4 /* UIKitRUMUserActionsHandler.swift in Sources */,
D29A9F5229DD85BB005C54A4 /* RUMUUIDGenerator.swift in Sources */,
61DCC84E2C071DCD00CB59E5 /* TelemetryInterecptor.swift in Sources */,
61DCC84E2C071DCD00CB59E5 /* TelemetryInterceptor.swift in Sources */,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎖️

@ncreated ncreated force-pushed the ncreated/RUM-1660/send-rum-session-ended-telemetry branch from fa4f3f4 to 7fae496 Compare June 12, 2024 08:52
@ncreated ncreated merged commit 15f7989 into develop Jun 12, 2024
8 checks passed
@ncreated ncreated deleted the ncreated/RUM-1660/send-rum-session-ended-telemetry branch June 12, 2024 09:34
@maciejburda maciejburda mentioned this pull request Jun 12, 2024
8 tasks
@maciejburda maciejburda mentioned this pull request Jun 13, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants