Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUM-2355 fix: Network Instrumentation #1588

Merged
merged 17 commits into from
Jan 8, 2024

Conversation

maxep
Copy link
Member

@maxep maxep commented Dec 13, 2023

Problem Statement

Task Interception

After first configuration, our new Network Instrumentation logic introduced in #1394 is intercepting all URLSessionTask regardless of their delegate.

Starting an interception can result in Start Resource in RUM. When the custom delegate is not in play, the interception will never finish and so the RUM Resource will never stop. This means that the resource event won't be transmitted, but it will keep the Resource scopes active and therefore the view scopes open.

Task Completion

Tasks created with completion handler on pre-iOS 15 are never completed: We apply #available(iOS 15, tvOS 15, *) condition for completing the interception when receiving metrics, but we have no condition for pre-iOS 15.

Swizzling Unbinding

Swizzlings on URLSession* are not removed when deallocating the NetworkInstrumentationFeature, this is preventing from re-enabling the instrumentation on a new core instance. It is blocking the 'stop core instance' feature on iOS.

Legacy DatadogURLSessionDelegate

Changes in the legacy delegate are preventing from initialising multiple instances with different first-party hosts: The DatadogURLSessionDelegate.init is doing a static URLSessionInstrumentation.enable which can be done only once for a given delegate type.

Solutions

Instance-based Swizzler

In #1564, we have introduced reliable unswizzling logic which allows to create multiple instances of swizzler for the same method. I'm leveraging this to associate swizzlers to an instance of NetworkInstrumentationFeature, we can then unswizzle safely at deallocation.

The instance-based swizzling also allow to fix the legacy DatadogURLSessionDelegate by keeping a swizzler per delegate instance.

Check for Task's delegate type

We now check for the delegate type before starting an interception,either when using the new URLSessionInstrumentation or the legacy DatadogURLSessionDelegate.

Task Completion

The URLSessionSwizzler now allow to swizzle the URLSession.dataTask(with:completionHandler:) methods on both URL and URLRequest and swizzling is applied for all OS versions.

Review

I suggest to review the following files first:

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

  • Run unit tests for Core, RUM, Trace, Logs, CR and WVT
  • Run unit tests for Session Replay
  • Run integration tests
  • Run smoke tests
  • Run tests for tools/

@maxep maxep self-assigned this Dec 13, 2023
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Dec 13, 2023

Datadog Report

Branch report: maxep/RUM-2355/fix-network-instrumentation
Commit report: 8174c17

❄️ dd-sdk-ios: 0 Failed, 1 New Flaky, 2744 Passed, 0 Skipped, 31m 37.13s Wall Time

New Flaky Tests (1)

  • testGivenURLSessionWithCustomDelegate_whenUsingAsyncData_itPassesAllValuesToTheInterceptor - NetworkInstrumentationFeatureTests - Last Failure

    Expand for error
     
     Assertion Failure: Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Notify intercepion did complete".
     Assertion Failure at NetworkInstrumentationFeatureTests.swift:358: XCTAssertEqual failed: ("nil") is not equal to ("Optional("some error")")
    

@maxep maxep force-pushed the maxep/RUM-2355/fix-network-instrumentation branch from 8d1c890 to 894bf48 Compare December 14, 2023 09:31
@maxep maxep changed the title RUM-2355 Fix Network Instrumentation RUM-2355 fix: Network Instrumentation Dec 14, 2023
@maxep maxep force-pushed the maxep/RUM-2355/fix-network-instrumentation branch from f99a874 to 86001c6 Compare December 14, 2023 11:08
@maxep maxep marked this pull request as ready for review December 14, 2023 11:10
@maxep maxep requested review from a team as code owners December 14, 2023 11:10
Comment on lines -42 to +52
set {
pthread_rwlock_wrlock(&rwlock)
value = newValue
pthread_rwlock_unlock(&rwlock)
}
set { mutate { $0 = newValue } }
}

/// Provides a non-escaping closure for mutation.
/// The lock will be acquired once for writing before invoking the closure.
///
/// - Parameter closure: The closure with the mutable value.
public func mutate(_ closure: (inout Value) -> Void) {
public func mutate(_ closure: (inout Value) throws -> Void) rethrows {
pthread_rwlock_wrlock(&rwlock)
closure(&value)
pthread_rwlock_unlock(&rwlock)
defer { pthread_rwlock_unlock(&rwlock) }
try closure(&value)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related, but better :)

Comment on lines +17 to +23
/// Datadog delegate object.
///
/// The class implementing `DDURLSessionDelegateProviding` must ensure that following method calls are forwarded to `ddURLSessionDelegate`:
/// - `func urlSession(_:task:didFinishCollecting:)`
/// - `func urlSession(_:task:didCompleteWithError:)`
/// - `func urlSession(_:dataTask:didReceive:)`
var ddURLSessionDelegate: DatadogURLSessionDelegate { get }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to re-introduce this constrain to restore legacy behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which particular behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one:

Legacy DatadogURLSessionDelegate

Changes in the legacy delegate are preventing from initialising multiple instances with different first-party hosts:
The DatadogURLSessionDelegate.init is doing a static URLSessionInstrumentation.enable which can be done only once for a given delegate type.

The DatadogURLSessionDelegate class cannot be instrumented like a custom delegate type since it defines first-party hosts per instance. It will now define it's own NetworkInstrumentationSwizzler instance to intercept task events, therefor the deprecated __URLSessionDelegateProviding still needs to forward data to the ddURLSessionDelegate.

@@ -28,7 +35,7 @@ open class DatadogURLSessionDelegate: NSObject, URLSessionDataDelegate {
return URLSessionInterceptor.shared(in: core)
}

/* private */ public let firstPartyHosts: FirstPartyHosts
let swizzler = URLSessionSwizzler()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instances of DatadogURLSessionDelegate hold their own swizzling instance to instrument the task on resume.

@maxep maxep marked this pull request as draft December 14, 2023 17:04
@maxep maxep force-pushed the maxep/RUM-2355/fix-network-instrumentation branch from e9228cc to b2bd78a Compare December 18, 2023 09:48
@maxep maxep marked this pull request as ready for review December 18, 2023 09:51
@@ -38,6 +38,9 @@ internal final class NetworkInstrumentationFeature: DatadogFeature {
@ReadWriteLock
internal var handlers: [DatadogURLSessionHandler] = []

@ReadWriteLock
internal var swizzlers: [ObjectIdentifier: URLSessionSwizzler] = [:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, never knew such type existed.

}
}
/// Swizzles `URLSession*` methods.
internal final class URLSessionSwizzler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/Blocking

Combining all the swizzling code inside one type was never the intent but I understand the reasoning behind. However, this can be done with the existing approach by making the each swizzler type as intance type.

This makes the testing each swizzling class easy, better code navigation. You can compose those types in a NetworkInstrumentationSwizzler type.

something like

class NetworkInstrumentationSwizzler {
    let urlSessionSwizzler: URLSessionSwizzler
    let urlSessionTaskDelegateSwizzler: URLSessionTaskDelegateSwizzler
    ....
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested, I have split the swizzler class, see the last commit. But TBH, I don't see the benefit 🤔 It just adds another layer and doesn't help much with code navigation.

@maxep maxep requested a review from ganeshnj December 20, 2023 18:25
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massive efforts 👌 - I noticed only one blocking change, other look ok. It is really hard to see it well through (the domain is complex, code looks fine), hence I count on the quality of tests. Glad we add more integration scenarios.

Last question, about:

Task Interception
After first configuration, our new Network Instrumentation logic introduced in #1394 is intercepting all URLSessionTask regardless of their delegate.

Is there a test we add to verify that not instrumented requests are ignored?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking/ The new logic introduces new global swizzlings that need to be covered in integrity checks. Here we remove old checks instead of updating them ⛔. Balancing following calls must be enforced by DatadogTestsObserver, otherwise we will run into flakiness ❄️:

urlSessionSwizzler.swizzle()
urlSessionTaskSwizzler.swizzle()
urlSessionTaskDelegateSwizzler.swizzle()

urlSessionSwizzler.unswizzle()
urlSessionTaskSwizzler.unswizzle()
urlSessionTaskDelegateSwizzler.unswizzle()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, it was actually covered twice. Swizzling integrity is already checked by this which will fail if any swizzling is left, printing the method signatures 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, can you add some simple test case?

  • swizzling once, Swizzling.methods.count == 1
  • same method again, Swizzling.methods.count == 2
  • another method, Swizzling.methods.count == 3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this to MethodSwizzlerTests 👍

@@ -38,6 +38,9 @@ internal final class NetworkInstrumentationFeature: DatadogFeature {
@ReadWriteLock
internal var handlers: [DatadogURLSessionHandler] = []

@ReadWriteLock
internal var swizzlers: [ObjectIdentifier: NetworkInstrumentationSwizzler] = [:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion/ We should either mark it private or make useful in troubleshooting. The problem with its usability is that ObjectIdentifier completely erases the identity of underlying (identified) meta type. We can preserve it with properly abstracting the ObjectIdentifier:

internal struct SwizzledDelegateClass: Hashable, Equatable {
    internal let delegateClass: URLSessionDataDelegate.Type
    private let identifier: ObjectIdentifier

    init(delegateClass: URLSessionDataDelegate.Type) {
        self.delegateClass = delegateClass
        self.identifier = ObjectIdentifier(delegateClass)
    }

    func hash(into hasher: inout Hasher) { hasher.combine(identifier) }
    static func == (lhs: SwizzledDelegateClass, rhs: SwizzledDelegateClass) -> Bool { lhs.identifier == rhs.identifier }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm making this private and adding debugDescription to MethodSwizzler that should gives us enough information to troubleshoot.

Comment on lines 81 to 84
if let currentRequest = task.currentRequest {
let request = self.intercept(request: currentRequest, additionalFirstPartyHosts: configuredFirstPartyHosts)
task.dd.override(currentRequest: request)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question/ This condition is quite puzzling. In what circumstances we expect task to not have currentRequest? Let's explain it in the code comment if appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question, I don't know what could nillify the originalRequest or currentRequest 🤔 the doc doesn't say..

IntegrationTests/Runner/Environment.swift Outdated Show resolved Hide resolved
Comment on lines 58 to 63
/// Use a custom delegate.
/// and define `firstPartyHosts` in feature configuration.
case customWithFeatureFirstPartyHosts
/// Use a custom delegate.
/// and define `firstPartyHosts` with delegate's configuration.
case customWithAdditionalFirstyPartyHosts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏅

@maxep maxep force-pushed the maxep/RUM-2355/fix-network-instrumentation branch from d276853 to 3e92e18 Compare December 22, 2023 11:49
@maxep maxep force-pushed the maxep/RUM-2355/fix-network-instrumentation branch from 3e92e18 to 4bfa7c4 Compare January 4, 2024 09:21
@maxep maxep requested a review from ncreated January 4, 2024 09:22
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Jan 4, 2024

Datadog Report

Branch report: maxep/RUM-2355/fix-network-instrumentation
Commit report: 9e46d77

dd-sdk-ios: 0 Failed, 0 New Flaky, 2774 Passed, 0 Skipped, 28m 17.47s Wall Time

@maxep maxep force-pushed the maxep/RUM-2355/fix-network-instrumentation branch from 4bfa7c4 to 738d599 Compare January 4, 2024 09:35
@maxep
Copy link
Member Author

maxep commented Jan 4, 2024

It is ready for another round of review :)

@ncreated, to your question:

Last question, about:

Task Interception
After first configuration, our new Network Instrumentation logic introduced in #1394 is intercepting all URLSessionTask regardless of their delegate.

Is there a test we add to verify that not instrumented requests are ignored?

Yes, the NetworkInstrumentationFeatureTests now tests not instrumented requests: here, here, and here. It covers async/await, session delegate, and task delegate.

@@ -132,34 +132,6 @@ internal class DatadogTestsObserver: NSObject, XCTestObservation {

If all above conditions are met, this failure might indicate a memory leak in the implementation.
"""
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing these checks seems like a back step - in case of a memory leak we will not able to catch any lingering swizzlings.

A check something like if there is an instance of swizzler in memory or not will be helpful.

something like

func checkInstancesOfType(_ type: AnyClass) -> Bool {
    let typeString = String(describing: type)
    
    for bundle in Bundle.allBundles {
        if let className = bundle.classNamed(typeString) {
            // Check if an instance of the class exists
            if class_getInstanceSize(className) > 0 {
                return true
            }
        }
    }
    
    return false
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already there, see my comment. These checks were redundant, we are already asserting that no swizzling is left here. Failure will print the method signature, so we will know which sizzling is at fault.

Comment on lines 77 to 79
guard let self = self, task.dd.delegate?.isKind(of: configuration.delegateClass) == true else {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the dd.delegate lifecycle is managed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a computed var and not retained. see its implementation here

Comment on lines +17 to +23
/// Datadog delegate object.
///
/// The class implementing `DDURLSessionDelegateProviding` must ensure that following method calls are forwarded to `ddURLSessionDelegate`:
/// - `func urlSession(_:task:didFinishCollecting:)`
/// - `func urlSession(_:task:didCompleteWithError:)`
/// - `func urlSession(_:dataTask:didReceive:)`
var ddURLSessionDelegate: DatadogURLSessionDelegate { get }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which particular behavior?

@maxep maxep requested a review from ganeshnj January 4, 2024 13:30
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍. Seeing a lot of changes that involve OS-version-specific swizzlings, can we verify if unit tests pass all major iOS versions from 12 to 16 🙏? Currently we can't rely on nightly tests for this.

@maxep
Copy link
Member Author

maxep commented Jan 5, 2024

I have ran the test-suites on the following versions:

OS Version Unit Tests Integration Tests
iOS 12.4 🚫 Unable to test due to SPM failure
iOS 13.7 🚫 units fails on master as well
iOS 14.4
iOS 15.5
iOS 16.0

It fails on iOS 13, from what I see it's due to the URLSessionTask.resume which doesnt seem to trigger the swizzled method. It fails the same on master so it is not due to the change in this PR nor the change in swizzling.
As this additional issue is not introduce by changes in this PR, we can tackle this seperatly.

@maxep maxep merged commit d5d9967 into develop Jan 8, 2024
11 checks passed
@maxep maxep deleted the maxep/RUM-2355/fix-network-instrumentation branch January 8, 2024 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants