Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a request id header for all communications. #17

Merged
merged 2 commits into from
Jul 10, 2020
Merged

Conversation

a-feld
Copy link
Contributor

@a-feld a-feld commented Jul 8, 2020

Adding a request ID header makes it feasible in the future for ingest to deduplicate data being sent from the SDKs without any updates being required to the SDK clients.

@mwlang
Copy link
Contributor

mwlang commented Jul 8, 2020

Ruby's UUID version 4 (UUID4) is not particularly fast and we recently removed its use in the Ruby agent, favoring a simpler random number generator. In high-throughput environments, generating a UUID4 can lead to file handle exhaustion since Ruby's implementation makes use of /dev/urandom

so the question here: can we get away with simple random number generations, or must it truly be UUID4 standard?

communication.md Outdated Show resolved Hide resolved
@a-feld
Copy link
Contributor Author

a-feld commented Jul 8, 2020

That's a good point @mwlang , I assumed that requests would not be high throughput (we default to batching every 5 seconds), but that's no excuse for poor performance 😄

That being said, there are ~122 bits of entropy in UUID4 according to https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random)

We can change the request ID to using a random hex number but we should probably figure out how much entropy is needed.

We'll have to look at the maximum window we'd like to deduplicate payloads in (I'm thinking 1-5 minutes), how many requests we expect in that window both now and in the future, and what the probability is for a collision. We certainly want to avoid cases where a collision would cause data loss. For the entropy required, I think having input from our ingest team would be helpful. I'll reach out to them for comment.

@jkwatson
Copy link
Contributor

jkwatson commented Jul 8, 2020

Ruby's UUID version 4 (UUID4) is not particularly fast and we recently removed its use in the Ruby agent, favoring a simpler random number generator. In high-throughput environments, generating a UUID4 can lead to file handle exhaustion since Ruby's implementation makes use of /dev/urandom

so the question here: can we get away with simple random number generations, or must it truly be UUID4 standard?

One or two UUIDs per harvest cycle doesn't seem too onerous. Is it really that bad, that you can't generate a couple every 5 seconds or so without impacting things? I would hope the cost of generating a UUID would be much, much less than the network latency. Am I wrong about that?

@justinfoote
Copy link
Contributor

justinfoote commented Jul 8, 2020

More than the performance impact of generating a UUID4 in ruby, the fact that it opens a new file descriptor to read from /dev/urandom is the bigger problem. In cases where an application is running out of file descriptors, sometimes the app-crashing stacktrace will be random number generation within the Telemetry SDK. This isn't a good look. See also.

I totally agree with Allan's proposal of requiring (or at least strongly suggesting) some minimum amount of entropy. My hunch is that 122 bits is far more than is strictly necessary, but also the cost of extra bits isn't too much. I wouldn't be opposed to suggesting something like tracestate -- 32 characters of hex representing 128 bits of random.

@mwlang
Copy link
Contributor

mwlang commented Jul 8, 2020

One or two UUIDs per harvest cycle doesn't seem too onerous. Is it really that bad, that you can't generate a couple every 5 seconds or so without impacting things? I would hope the cost of generating a UUID would be much, much less than the network latency. Am I wrong about that?

Nope. Not wrong. Justin goes into a little more detail about our pain point. My primary goal is to avoid bringing back that pain if there's an alternative that's "good enough"(tm).

@justinfoote going with the 32 characters of hex representing 128 bits of random seems plenty good to me and would entirely side-step our earlier issues.

@jkwatson
Copy link
Contributor

jkwatson commented Jul 8, 2020

More than the performance impact of generating a UUID4 in ruby, the fact that it opens a new file descriptor to read from /dev/urandom is the bigger problem. In cases where an application is running out of file descriptors, sometimes the app-crashing stacktrace will be random number generation within the Telemetry SDK. This isn't a good look. See also.

I totally agree with Allan's proposal of requiring (or at least strongly suggesting) some minimum amount of entropy. My hunch is that 122 bits is far more than is strictly necessary, but also the cost of extra bits isn't too much. I wouldn't be opposed to suggesting something like tracestate -- 32 characters of hex representing 128 bits of random.

Ah, I get it. The problem isn't UUIDs in general, it's the SecureRandom version in ruby that requires a file descriptor to be opened. Gotcha. Sounds like this is a ruby-specific implementation detail, rather than a general problem with UUIDs.

@a-feld
Copy link
Contributor Author

a-feld commented Jul 8, 2020

@mwlang @justinfoote I've updated the proposal to a 32 character hex string based on the feedback! 😄

@mwlang
Copy link
Contributor

mwlang commented Jul 8, 2020

Ruby agent approves this change!

Copy link
Contributor

@justinfoote justinfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@a-feld
Copy link
Contributor Author

a-feld commented Jul 10, 2020

After chatting with folks, I've changed the proposal back to uuid4. Part of this decision is that we believe the risk of collisions (which would result in data loss under the proposed usage of the request id) resulting from the use of a non-uniform random number generator is too high of a risk. Using a uuid provides greater collision resistance.

@a-feld a-feld merged commit 2840953 into master Jul 10, 2020
@a-feld a-feld deleted the request-id branch July 10, 2020 21:17
a-feld added a commit to newrelic/newrelic-telemetry-sdk-python that referenced this pull request Jul 13, 2020
The x-request-id header is a uuid4 value that may be used by ingest in
the future.

See newrelic/newrelic-telemetry-sdk-specs#17 for
details.
a-feld added a commit to newrelic/newrelic-telemetry-sdk-python that referenced this pull request Jul 13, 2020
The x-request-id header is a uuid4 value that may be used by ingest in
the future.

See newrelic/newrelic-telemetry-sdk-specs#17 for
details.
a-feld added a commit to newrelic/newrelic-telemetry-sdk-python that referenced this pull request Jul 13, 2020
The x-request-id header is a uuid4 value that may be used by ingest in
the future.

See newrelic/newrelic-telemetry-sdk-specs#17 for
details.
a-feld added a commit to newrelic/newrelic-telemetry-sdk-python that referenced this pull request Jul 13, 2020
The x-request-id header is a uuid4 value that may be used by ingest in
the future.

See newrelic/newrelic-telemetry-sdk-specs#17 for
details.
@michaelgoin
Copy link
Member

dang. wish i had seen this earlier. this will be slowwwwww in Node land. up to several milliseconds occasionally to generate (possible to be even longer right after startup).

while it is only on request, we've seen attempted uses in the wild where metrics are fired off right after generating. the whole ID situation in Node is quite unfortunate. given we've likely missed the boat here, i guess we may need to get very prescriptive in documentation on forcing batching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants