-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUM-3588 fix: Always update crash context with RUM attributes #1834
RUM-3588 fix: Always update crash context with RUM attributes #1834
Conversation
Datadog ReportBranch report: ✅ 0 Failed, 3188 Passed, 0 Skipped, 2m 18.87s Total Time 🔻 Code Coverage Decreases vs Default Branch (9)
|
5a893cd
to
13d7263
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 I did revamp of the way we test CrashContextCoreProvider
to pay down tech debt taken during V2-migration. The debt was on testing the provider
through sending messages to mocked core.
Full Rationale (from my Slack thread):
The way we manage CrashContext updates is super fragile 🤹. Here we replace the whole context on every DatadogContext
update. SDK context updates are very frequent, meaning we do it often. To make it correct, there are few elements that require contributor to have deep understanding of following bits:
- The custom Equatable implementation for CrashContext must be updated with the new value added.
- The convenience
CrashContext.init()
must be updated with transporting the previous value, otherwise it will erase previous state throughDatadogContext
overwrite - The initial
CrashContext
isnil
until the firstDatadogContext
arrives — which was not visible in previous tests as they followed the pattern of installing it inside core mock that accidentally guarantees the availability of firstDatadogContext
.
With the testing convention I introduced to this file, now all above criteria are covered.
) | ||
} | ||
|
||
func testGivenContextWithLastLogttributesSet_whenItGetsEncoded_thenTheValueIsPreservedAfterDecoding() throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Adding coverage to coding global Log attributes as it was missing.
let baggageSent = try XCTUnwrap(featureScope.messagesSent().lastBaggage(withKey: RUMBaggageKeys.sessionState)) | ||
let sessionStateSent: RUMSessionState = try baggageSent.decode() | ||
XCTAssertEqual(sessionStateSent, expectedSessionState, "It must send 'session state' message") | ||
let actualSessionState = try XCTUnwrap(fatalErrorContext.sessionState) | ||
XCTAssertEqual(actualSessionState, expectedSessionState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 This change in RUMSessionScopeTests
is around asserting the update on FatalErrorContext
instead of inspecting the baggage sent over message bus. The baggage is now tested in FatalErrorContextTests
which streamlines things and better separates concerns.
func testWhenScopeEnded_itDoesNotSendViewResetMessage() { | ||
let featureScope = FeatureScopeMock() | ||
|
||
// Given | ||
let scope: RUMSessionScope = .mockWith( | ||
parent: parent, | ||
startTime: Date(), | ||
dependencies: .mockWith(featureScope: featureScope) | ||
) | ||
|
||
let startViewCommand = RUMStartViewCommand.mockWith(time: Date(), identity: .mockViewIdentifier()) | ||
_ = scope.process(command: startViewCommand, context: context, writer: writer) | ||
let startResourceCommand = RUMStartResourceCommand.mockWith(time: Date()) | ||
_ = scope.process(command: startResourceCommand, context: context, writer: writer) | ||
|
||
// When | ||
_ = scope.process(command: RUMStopSessionCommand.mockWith(time: Date()), context: context, writer: writer) | ||
let stopResourceCommand = RUMStopResourceCommand.mockWith(resourceKey: startResourceCommand.resourceKey, time: Date()) | ||
_ = scope.process(command: stopResourceCommand, context: context, writer: writer) | ||
|
||
// Then | ||
let viewResetMessages = featureScope.messagesSent().filter { $0.asBaggage?.key == RUMBaggageKeys.viewReset } | ||
XCTAssertEqual(viewResetMessages.count, 1, "It must send only one 'view reset' message") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 I remove this test as it covers non-existing behaviour. Once RUMStopSessionCommand
is received, the RUM monitor deallocates its scope, so it can no longer receive future commands (RUMStopResourceCommand
in this case).
let baggageSent = try XCTUnwrap(featureScope.messagesSent().firstBaggage(withKey: RUMBaggageKeys.viewEvent)) | ||
let rumViewSent: RUMViewEvent = try baggageSent.decode() | ||
DDAssertReflectionEqual(rumViewSent, rumViewWritten, "It must sent written event over message bus") | ||
let rumViewInFatalErrorContext = try XCTUnwrap(fatalErrorContext.view) | ||
DDAssertReflectionEqual(rumViewWritten, rumViewInFatalErrorContext, "It must update fatal error context with the view event written") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 This change in RUMViewScopeTests
is around asserting the update on FatalErrorContext
instead of inspecting the baggage sent over message bus. The baggage is now tested in FatalErrorContextTests
which streamlines things and better separates concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Great coverage!
Some minor comments
private func updateRUMAttributes(with baggage: FeatureBaggage, to core: DatadogCoreProtocol) { | ||
queue.async { [weak core] in | ||
do { | ||
self.rumAttributes = try baggage.decode(type: GlobalRUMAttributes.self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we weakify self as well? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Good point, ✅ updated in the whole file
|
||
import Foundation | ||
|
||
public struct GlobalRUMAttributes: Codable, PassthroughAnyCodable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to write extension for [AttributeKey: AttributeValue]
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extending standard type would bring much less clarity to the whole concept. GlobalRUMAttributes
is a shared type (it is defined in DatadogInternal
), standing for contract between RUM and Crash Reporting. Whenever we step on this type in code, we know it is meant for message bus communication. Same could not be achieved with using standard ubiquitous type.
Also, the type defines two aspects that could not be supported in dictionary:
Codable
- so we can code it back and forth ([String: Endocable]
can't beCodable
);PassthroughAnyCodable
- to avoid marshalling the type when passing it over message bus ([String: Encodable]
can't be managed this way as we use it transitively in multiple other types exchanged over MB, e.g. for encoding user attributes in RUM view event);
but do not send view update event (to avoid performance penalties such as adding backpressure to writer queue, increasing RUM directory size with more event writes and effectively extending the backlog of uploadable files for RUM)
An alternative approach was sending attributes as part of last RUMView, however causing view updates on every attribute change is producing significant backpressure on RUM queue.
5b17904
to
99b72b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great! Thanks for tackling the debt, it is way bette now.
I have left a small suggestion on naming but it can be merged as is.
/// It tracks value changes and notifies updates on message bus. | ||
internal final class FatalErrorContextNotifier { | ||
internal final class FatalErrorContextNotifier: FatalErrorContextNotifying { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/suggestion I think the names should reflect responsibility (interface) vs. behavior (imp). Here it would sending the context vs. sending on the bus.
internal final class FatalErrorContextBus: FatalErrorContextSender
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true for strategy pattern, but is not the case here. The protocol is made purely for dependency injection and testability. It is not for introducing multiple strategies of sending FatalErrorContext
.
If at any point we want to notify FEC through different implementations, then I'll be 100% on the side of this suggestion. For now I'll keep it as it is 🙂.
What and why?
📦 With this PR, RUM error and RUM view sent after crash will always include up-to-date attributes set on
RUMMonitor
.Because we don't update crash context with the "last RUM view" info on every change to global attributes, the global RUM attributes held in "last view" were not always correct. Global attributes change that happened during the last active view was never included in the crash error.
How?
The
RUM.FatalErrorContextNotifier
is updated withglobalAttributes
property. Upon change, it sends a baggage over message bus holdingDatadogInternal.GlobalRUMAttributes
(new type). It is received inDatadogCrashReporting
and gets encoded intoCrashContext
.Upon restarting the app from crash, the value is decoded to
RUM.CrashContext
on regular basis.🎁 On testing front, I separated tests for updating the
FatalErrorContextNotifier
(1 and 2) from sending encoded baggage over message bus (3 and 4):♻️ Considered Alternative Approach
In my initial attempt, I tried a simpler approach of updating
crashContext.lastRUMView
with latest RUM attributes. That only required producing view updates on every global attribute change and updating the view event inFatalErrorContext
insideRUMViewScope
. This however added a linear backpressure to the RUM queue (red line), growing with the number of calls tomonitor.addAttribute(forKey:)
. With lack of "add multiple attributes" API, this perf hit is not acceptable.The solution implemented in this PR adds much less to the RUM backpressure (yellow line). The impact is related to message bus communication which is done on shared thread.
Review checklist
Custom CI job configuration (optional)
tools/