-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUM-2925 feat: Add backtrace generation capability to DatadogCoreProtocol
#1687
RUM-2925 feat: Add backtrace generation capability to DatadogCoreProtocol
#1687
Conversation
DatadogCoreProtocol
Datadog ReportBranch report: ✅ 0 Failed, 2758 Passed, 0 Skipped, 10m 39.79s Wall Time |
let featureDirectories = try directory.getFeatureDirectories(forFeatureNamed: T.name) | ||
if let feature = feature as? DatadogRemoteFeature { | ||
let featureDirectories = try directory.getFeatureDirectories(forFeatureNamed: T.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a miss - we need to create storage directories only for DatadogRemoteFeature
. Otherwise we will be creating it for NetworkInstrumentationFeature
and BacktraceReportingFeature
unnecessarily. Added extra unit tests for this change.
if let plcr = PLCrashReporterPlugin.thirdPartyCrashReporter { | ||
try core.register(backtraceReporter: BacktraceReporter(reporter: plcr)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be more elegant, but would require revamping the startup sequence for DatadogCrashReporting
. It still depends on CrashReportingPlugin
, which was V1 dependency-inverter, no longer required in V2. This refactor is medium-size, hence I didn't want to do it for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a small question 👌
import Foundation | ||
|
||
/// Crash Report format supported by Datadog SDK. | ||
@objc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this to be inter-operable with objc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair call, I traced this back to this commit 2f892c2 where we were adding Obj-c compatibility for Crash Reporting. This was back in V1. With V2 aligning module dependencies much better, this can be now internal to Swift domain, so it is not required. Changed here and consequently in few other places. ✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!! I really appreciate the extension pattern applied to the core protocol 👌
DatadogInternal/Sources/FeatureAPIs/CrashReporting/BacktraceReport.swift
Outdated
Show resolved
Hide resolved
as in V2 it no longer requires visibility in Obj-c API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
/// Tests integration of `DatadogCore` and `DatadogCrashReporting` for backtrace generation. | ||
class GeneratingBacktraceTests: XCTestCase { | ||
private var core: DatadogCoreProxy! // swiftlint:disable:this implicitly_unwrapped_optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth having this lint rule global to all the tests
// This is quite opportunistic - we map PLCR's live report through existing `DDCrashReport` builder to | ||
// then extract essential elements for assembling `BacktraceReport`. It works for now, but be careful | ||
// with how this evolves. We may need a dedicated `BacktraceReport` builder that only shares some code | ||
// with `DDCrashReport` builder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's mostly affected by the PLCR update?
Do we expect to have failed unit tests when something like this happen, or we should have some extra measures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is PLCR-safe 👍. Basically, both for Crash Reports and Backtrace Reports, the value returned by PLCR is PLCrashReport
. We then map it through our DDCrashReportBuilder
, whereas implementing distinct BacktraceReportBuilder
would be more correct and scalable. We don't need it today, hence I'm leaving this decision context in comment.
What and why?
📦 This PR adds a new capability to the SDK core: generating backtrace. Before, stack traces were only captured for crashes, now we enable the SDK to dump all threads and binary images at any moment at runtime.
This will be used in App Hangs monitoring to associate main thread's stack trace to pending hang. The actual work will be done in next PR, here we only focus on adding "backtrace reporting" capability to core.
How?
An entry point to this functionality is
generateLiveReport
API in PLCR:It dumps a snapshot of all running threads in the process. Because it uses the exact format of
PLCrashReport
, we're able to map it into ourDDCrashReport
representation (that includes threads and binary images).Because the use case here is different than crash reporting, this PR adds
BacktraceReport
type that holds essential information for later stack traces symbolication in Datadog app:Internal types are shared with
DDCrashReport
for code reuse.🏗️ Architecture change
Architecture-wise, the challenge in this PR was to expose backtrace generation (living in PLCR and abstracted by our Crash Reporting) to other feature modules (RUM in particular, but Logs and Trace can follow).
To solve this, when enabled,
DatadogCrashReporting
registersBacktraceRecordingFeature
to the core. This is similar howNetworkInstrumentationFeature
operates. Later on, when requesting backtrace, RUM will call core and the call will be delegated to Crash Reporting. Picture:Following on the action points from our last iOS Dev Chat, I moved
DatadogCrashReporting
APIs toDatadogInternal
in this PR. This way both RUM and Crash Reporting can use the same model definition without requiring type erasing through "baggage" concept:Moving also the
DDCrashReport
which unlocks the performance optimisation that we have planned forRUM-2971
as a follow up action from (internal) SDK latency investigation. By sharing the model definition between features, there is no longer a need for coding it through message-bus "baggage".Review checklist
Custom CI job configuration (optional)
tools/