-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUM-1660 chore: Send "RUM Session Ended" telemetry #1866
RUM-1660 chore: Send "RUM Session Ended" telemetry #1866
Conversation
bce9791
to
3244358
Compare
Datadog ReportBranch report: ✅ 0 Failed, 2998 Passed, 0 Skipped, 2m 1.36s Total Time 🔻 Code Coverage Decreases vs Default Branch (7)
|
Datadog/IntegrationUnitTests/RUM/SDKMetrics/RUMSessionEndedMetricIntegrationTests.swift
Outdated
Show resolved
Hide resolved
Datadog/IntegrationUnitTests/RUM/SDKMetrics/RUMSessionEndedMetricIntegrationTests.swift
Outdated
Show resolved
Hide resolved
case iOSAppExtension | ||
|
||
public init(bundle: Bundle) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
/// Tracks the view event that occurred during the session. | ||
func track(view: RUMViewEvent) { | ||
guard view.session.id == sessionID else { | ||
return // sanity check, unexpected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth telemetry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed 👍. I added an error so we will see if views are tracked in wrong sessions (although unexpected, it is possible from upstream code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the test coverage!
I have left a comment re. usage of locks, LMKWYT.
import DatadogInternal | ||
|
||
/// Intercepts telemetry events sent through message bus. | ||
internal struct TelemetryInterecptor: FeatureMessageReceiver { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal struct TelemetryInterecptor: FeatureMessageReceiver { | |
internal struct TelemetryInterceptor: FeatureMessageReceiver { |
Given the simplicity of this receiver, it would make sense to define the sessionEndedMetric
in the TelemetryReceiver
where we already handle all telemetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it always starts simple at the beginning. The TelemetryReceiver
already covers multiple concerns, I didn't want to introduce yet another one. For that reason I've chosen composition and I'd prefer to stay with this unless this is a change request.
While it can be easily moved to TelemetryReceiver
, adding unit tests will be way harder as it operates on much higher abstractions than the TelemetryInterceptor
we introduce here. LTMKWDYT @maxep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't have strong opinion on this.
But the typos still persist ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the typos still persist ;)
Ops! Fixed ✅
/// Dictionary to keep track of pending metrics, keyed by session ID. | ||
@ReadWriteLock | ||
private var metricsBySessionID: [String: SessionEndedMetric] = [:] | ||
/// Array to keep track of pending metrics in their start order. | ||
@ReadWriteLock | ||
private var metrics: [SessionEndedMetric] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/discussion Mutating a metric property will acquire at least 2 locks, 1 in the controller and one in the metric. The perf penalty will be minimal because we don't expect many concurrent calls. But I want to highlight that the @ReadWriteLock
wrapper uses a fair pthread lock which is not cheap.
Moreover, the metric.track(*_:)
methods are acquiring the property lock multiple times, which could be prevented by calling the mutate
closure.
I think it could be redesigned to use a single lock in the controller and perform all mutations at the controller level, something like the following where SessionEndedMetric
could be a struct
:
internal final class SessionEndedMetricController {
// Starts a new metric for a given session.
func startMetric(sessionID: String, precondition: RUMSessionPrecondition?, context: DatadogContext)
// Tracks the view event that occurred during the session.
func track(view: RUMViewEvent, for sessionID: String)
// Tracks the kind of SDK error that occurred during the session.
func track(sdkErrorKind: String, for sessionID: String)
// Signals that the session was stopped with `stopSession()` API.
func trackWasStopped(for sessionID: String)
// Ends the metric for a given session, sending it to telemetry and removing it from pending metrics.
func endMetric(sessionID: String)
// Retrieves the last started metric.
var latestMetric: SessionEndedMetric? { get }
}
What do you think?
Note: the @ReadWriteLock
is convenient but shouldn't be our only lock. We could provide a lock object as well (similar to NSLock) so we can acquire a lock for operations and not only for property access. We could also provide unfair lock such as os_unfair_lock
which are cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I had this concern on double locking, but ignored it with assuming minimal impact and trying to keep the solution simple. With this feedback I'm happy to improve this 👌.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Addressed 👍. I made the SessionEndedMetric
a struct
with mutable methods. The controller is now a class which mutates the right SEM
using single RW lock 👍.
} | ||
|
||
private func updateMetric(for sessionID: RUMUUID?, _ mutation: (inout SessionEndedMetric?) throws -> Void) { | ||
guard let sessionID = (sessionID ?? pendingSessionIDs.last) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about pendingSessionIDs
, it seems that these array is only queried for the last element, so why not just keep the last session id? what's the purpose to create the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ambushwork The pendingSessionIDs
is more a history of pending sessions. It is true that we query only for the last element, but it might change from one query to another if endMetric(sessionID:)
is called in between (we remove the ID from array as part of that call). If using single property to store the "last ID", we won't know how to change it after endMetric(sessionID:)
as we wouldn't know the previous ID.
Also, even though sessions are expected to end in the same order as they were started, some iOS telemetry shows rare but existing consistency problems in session management. For that reason, we prefer to not introduce too many assumptions in "Session Ended" metric, which is another reason why we track the whole array of IDs rather than only the "latest" one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, quite clear for me now !
// proccss(command:context:writer) returned false, so the scope will be deallocated at the end of | ||
// this execution context. End the "RUM Session Ended" metric: | ||
defer { dependencies.sessionEndedMetric.endMetric(sessionID: scope.sessionUUID) } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the endMetric
can be triggered only when there is a command to process and it returns false, otherwise it can not be triggered it self even if the duration and timeout has passed the limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct 👍. Every state change in RUM is caused by command. Even in case of "15min timeout" or "4h max duration", the RUMApplicationScope
only recognizes these upon receiving next command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I left a minor suggestion on removing the extra lock on the pendingSessionIDs
property.
import DatadogInternal | ||
|
||
/// Intercepts telemetry events sent through message bus. | ||
internal struct TelemetryInterecptor: FeatureMessageReceiver { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't have strong opinion on this.
But the typos still persist ;)
@ReadWriteLock | ||
private var pendingSessionIDs: [RUMUUID] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/suggestion As this is private, and tightly linked to metricsBySessionID
, it can be mutated while acquiring the lock of metricsBySessionID
. Starting a metric would look like:
_metricsBySessionID.mutate { metrics in
metrics[sessionID] = SessionEndedMetric(sessionID: sessionID, precondition: precondition, context: context)
pendingSessionIDs.append(sessionID)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is private and tightly linked (...)
Fair call, done ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done, it looks great!!
@@ -8987,7 +8987,7 @@ | |||
D29A9F8929DD85BB005C54A4 /* VitalRefreshRateReader.swift in Sources */, | |||
D29A9F6929DD85BB005C54A4 /* UIKitRUMUserActionsHandler.swift in Sources */, | |||
D29A9F5229DD85BB005C54A4 /* RUMUUIDGenerator.swift in Sources */, | |||
61DCC84E2C071DCD00CB59E5 /* TelemetryInterecptor.swift in Sources */, | |||
61DCC84E2C071DCD00CB59E5 /* TelemetryInterceptor.swift in Sources */, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎖️
so it can be available for each product.
so it can be easily inejcted to different parts of RUM.
fa4f3f4
to
7fae496
Compare
What and why?
📦 🔭 Following the spec (internal), this PR adds "RUM Session Ended" telemetry.
The "RUM Session Ended" telemetry marks the conclusion of a RUM session and provides meta parameters that describe the session's quality and diagnostic details.
How?
The goal of this implementation is to better understand the possible states of a RUM session by making minimal assumptions about how sessions are tracked in RUM scopes. For instance, it can track multiple sessions simultaneously, which, though unlikely, is observed in some telemetry data. The design aims to be loosely coupled with the RUM architecture, allowing flexible and convenient access to the metric object rather than restricting its lifecycle to RUM scopes.
The high-level design of this telemetry consists of two main components:
SessionEndedMetricController
: Manages the lifecycle of metric objects.SessionEndedMetric
: Tracks the state of a RUM session and exports metric attributes.Both are thoroughly covered in unit tests. Integration tests ensure the proper integration of the metric with RUM.
The interface of
SessionEndedMetricController
:The interface of
SessionEndedMetric
:Review checklist
Custom CI job configuration (optional)
tools/