-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUMM-1490 Migrate to Intake API v2 #562
Conversation
private let ignoredNSURLErrorCodes = Set([ | ||
NSURLErrorNetworkConnectionLost, // -1005 | ||
NSURLErrorTimedOut, // -1001 | ||
NSURLErrorCannotParseResponse, // - 1017 | ||
NSURLErrorNotConnectedToInternet, // -1009 | ||
NSURLErrorCannotFindHost, // -1003 | ||
NSURLErrorSecureConnectionFailed, // -1200 | ||
NSURLErrorDataNotAllowed, // -1020 | ||
NSURLErrorCannotConnectToHost, // -1004 | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case .badRequest, .payloadTooLarge, .requestTimeout: | ||
// These codes mean that something wrong is happening either in the SDK or on the server - produce an error. | ||
return ( | ||
message: "Data upload finished with status code: \(statusCode)", | ||
error: nil, | ||
attributes: ["dd_request_id": requestID ?? "(???)"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I'm not sure if 408 Request Timeout
is worth sending error for Internal Monitoring. On one side this could indicate a slow response from Intake (monitored by other means), but on the other the SDK might be lagging on uploads. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally this case should be handled by URLSession
, right? if the connection stays idle, it produces connection timed out
error.
so i didn't fully understand what 408 means but i guess we can keep internal-monitor it anyway, if we can filter those errors out in Datadog UI.
maybe we can internal-monitor tooManyRequests
too. as this shouldn't normally happen as we batch the requests, it can indicate a configuration problem in the client's side. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can merge once this discussion is resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's distinction between client (URLSession
) timeout and server timeout - 429
means the server.
maybe we can internal-monitor tooManyRequests too.
Good idea, I added .tooManyRequests
to the list for alerting 👌.
.ddAPIKeyHeader(clientToken: configuration.clientToken), | ||
.ddEVPOriginHeader(source: configuration.common.source), | ||
.ddEVPOriginVersionHeader(), | ||
.ddRequestIDHeader(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an open question from @0xnm (in DataDog/dd-sdk-android#671) if the DD-REQUEST-ID
should be included all the time or only conditionally with a user setting.
f6491ec
to
c4cdb66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i left a few comments, not blocking but i wanted to discuss before approving
case .badRequest, .payloadTooLarge, .requestTimeout: | ||
// These codes mean that something wrong is happening either in the SDK or on the server - produce an error. | ||
return ( | ||
message: "Data upload finished with status code: \(statusCode)", | ||
error: nil, | ||
attributes: ["dd_request_id": requestID ?? "(???)"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally this case should be handled by URLSession
, right? if the connection stays idle, it produces connection timed out
error.
so i didn't fully understand what 408 means but i guess we can keep internal-monitor it anyway, if we can filter those errors out in Datadog UI.
maybe we can internal-monitor tooManyRequests
too. as this shouldn't normally happen as we batch the requests, it can indicate a configuration problem in the client's side. wdyt?
} | ||
|
||
semaphore.signal() | ||
} | ||
|
||
_ = semaphore.wait(timeout: .distantFuture) | ||
|
||
return uploadStatus ?? .unknown | ||
return uploadStatus ?? DataUploader.unreachableUploadStatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can see this case as completion not called error
or something along those lines. DataUploader.unreachableUploadStatus
isn't very intiutive at the first sight.
example, feel free to change the details:
private static let nonCalledCompletionStatus = DataUploadStatus(needsRetry: false, userDebugDescription: "An internal error occurred", userErrorMessage: nil, internalMonitoringError: "HTTPClient Completion not called")
i mean, at least the status value should try to explain the situation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ?? DataUploader.unreachableUploadStatus
here is to only satisfy the compiler - the right side of ??
operator will never be evaluated as enforced by the previous logic in this function. That's why the name and comment:
/// An unreachable upload status - only meant to satisfy the compiler.
private static let unreachableUploadStatus = DataUploadStatus(...)
Alternative to this would be:
guard let uploadStatus = uploadStatus else {
fatalError()
}
return uploadStatus
but as we avoid using fatalError()
in the SDK codebase, I adopted the solution which is already used in other places: returning known default as fallback.
Hence I don't think we should improve the error value here - this would give a false sense of reachability / usability.
// MARK: - Datadog Headers | ||
|
||
/// Datadog request authentication header. | ||
static func ddAPIKeyHeader(clientToken: String) -> HTTPHeader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be confusing for the outsiders/newcomers as we have client token
s and API key
s
what do you think of renaming this function to ddClientTokenHeader
? (of course we will keep "DD-API-KEY"
as value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I used consistent convention for 3 new headers:
static func ddAPIKeyHeader(clientToken: String) -> HTTPHeader
static func ddEVPOriginHeader(source: String) -> HTTPHeader
static func ddEVPOriginVersionHeader() -> HTTPHeader
to make it clearly visible that we follow the spec for all features:
// in LoggingFeature, TracingFeature, RUMFeature and InternalMonitoringFeature:
requestBuilder: RequestBuilder(
// ...
headers: [
// ...
.ddAPIKeyHeader(clientToken: configuration.clientToken),
.ddEVPOriginHeader(source: configuration.common.source),
.ddEVPOriginVersionHeader(),
]
)
The spec says we must send DD-API-KEY
, DD-EVP-ORIGIN
and DD-EVP-ORIGIN-VERSION
headers - and I shaped the above code to tell exactly that story.
IMO if we reverse it by naming the function ddClientTokenHeader()
, the DD-API-KEY
value used internally would be equally puzzling to the reader.
…tructs into `RequestBuilder`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What and why?
🚚 This PR updates the SDK's upload layer for Intake API v2, by:
path
from/v1/input/<token>
to/api/v2/<track-type>
wheretrack-type
isrum
,logs
orspans
;DD-API-KEY
- the client token,DD-EVP-ORIGIN
- the origin sdk tag (e.g.ios
orreact-native
),DD-EVP-ORIGIN-VERSION
: the SDK version,DD-REQUEST-ID
: a debug Intake request ID;202
,400
,401
,403
,408
,413
,429
,500
and503
.It also adds more verbosity to the console logs produced by
DataUploadWorker
, examples:Last, it adds few improvements to the internal monitoring we do around the upload stack:
NSURLError
codes are now ignored to not send false positives (e.g. when device goes offline it's not an error);413 Payload Too Large
).How?
The upload architecture remains the same, but I made few changes to make testing easier with new HTTP status codes:
DataUploadWorker
to theDataUploadStatus
type. With that change, I was able to write simple tests forDataUploadStatus
and only test two scenarios inDataUploadWorker
: needs retry or does not.DataUploadStatus
, which enables the same easy testing as for.needsRetry
.Review checklist