Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUMM-655 / RUMM-689 Use NTP time for Logs, Spans and RUM events #321

Conversation

ncreated
Copy link
Member

What and why?

📦 This PR updates all 3 features to use the NTP server time for events (logs, span, rum events) creation time. This ensures time compatibility with other Datadog products, e.g. aligns client's portion of a distributed trace. It also help by reporting the real time value of events, instead of depending on the device clock, which may be adjusted manually by the user.

How?

Dedicated component is introduced for applying the time correction:

internal protocol DateCorrectionType {
    /// Corrects given device time to server time using the last known time difference between the two.
    func toServerDate(deviceDate: Date) -> Date
}

Internally, the DateCorrection uses lyft/Kronos to perform time sync over NTP.

NTP synchronisation starts with Datadog.initialize() and may take from several to a dozen of seconds until it computes the precise time difference. This is notified by printing either .info or .warning message to user console (if using Datadog.debug flag).

All events are collected using device time. The correction to server time is applied at the writing time. This is to preserve relative times within the events, i.e. in Tracing we correct only the "start" date, as the "finish" date is given by relative span.duration. Similar for RUM, where all events have their date and other metrics use relative, nanosecond durations.

Last, but not least, to not perform double-correction, the &batch_time= query item was removed from all products.

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference

@ncreated ncreated self-assigned this Nov 26, 2020
Base automatically changed from ncreated/RUMM-655-sync-time-with-ntp-server to ncreated/RUMM-655-ntp-time-sync November 26, 2020 15:46
@ncreated ncreated marked this pull request as ready for review November 26, 2020 15:51
@ncreated ncreated requested a review from a team as a code owner November 26, 2020 15:51
Copy link
Contributor

@buranmert buranmert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DateCorrection.toServerDate(Date) -> Date looks confusing to me: DateCorrection sounds more like a struct containing correction data and toServerDate looks more like an instance method for Date type

also, do all Logger, SpanBuilder, RUMScopes, etc. need to know DateCorrectionType? i think they just need the correct date/time, they don't need to know more than DateProvider.now(). we can correct the time within DateProvider by injecting SystemTimeProvider in it, otherwise DateCorrection instances are not useful by themselves.

wdyt?

Sources/Datadog/Core/Upload/DataUploader.swift Outdated Show resolved Hide resolved
@ncreated ncreated force-pushed the ncreated/RUMM-655-use-ntp-time-when-writing-events branch from 3bdab63 to b648a5f Compare November 30, 2020 07:54
@ncreated
Copy link
Member Author

ncreated commented Nov 30, 2020

do all Logger, SpanBuilder, RUMScopes, etc. need to know DateCorrectionType? i think they just need the correct date/time, they don't need to know more than DateProvider.now(). we can correct the time within DateProvider by injecting SystemTimeProvider in it

wdyt?

As explained:

All events are collected using device time. The correction to server time is applied at the writing time. This is to preserve relative times within the events, i.e. in Tracing we correct only the "start" date, as the "finish" date is given by relative span.duration. Similar for RUM, where all events have their date and other metrics use relative, nanosecond durations.

Returning server date from DateProvider would mean collecting events in server time. This is incorrect, as the server time becomes more precise while the NTP sync is being performed. If we capture server time in tracer.startSpan() and then capture it again after 200ns in span.finish() those two dates may not vary by 200ns (the duration might be even negative!).

There are also other examples where returning server date from DateProvider will be wrong, including:

  • batch files are named and ordered by timestamp (Clock.now doesn't have to return ascending timestamps while sync),
  • network requests are not signed by dateProvider.currentDate() - we take their timing from task metrics.

DateCorrection.toServerDate(Date) -> Date looks confusing to me: DateCorrection sounds more like a struct containing correction data and toServerDate looks more like an instance method for Date type

DateCorrection is a basic interface with clear responsibility:

/// Adjusts device time to server time using the time difference calculated with NTP.
internal protocol DateCorrectionType {
    /// Corrects given device time to server time using the last known time difference between the two.
    func toServerDate(deviceDate: Date) -> Date
}

Abstracting it with protocol is extremely handy for unit tests, as we can run them stable with no date correction. One integration test was added for each product to test if a known correction is applied.

Copy link
Contributor

@buranmert buranmert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see the point of using device time until the very end 👌

regarding the namings, of course it's very clear for you because you wrote it :)
i see a simpler implementation in Android: if you need to get time information you immediately know where to go.

i'm approving because it works ✅
but there seems to be some room for improvement here

@ncreated ncreated merged commit 18c6c9c into ncreated/RUMM-655-ntp-time-sync Dec 1, 2020
@ncreated ncreated deleted the ncreated/RUMM-655-use-ntp-time-when-writing-events branch December 1, 2020 15:37
@ncreated ncreated mentioned this pull request Dec 1, 2020
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants