Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: why so large counter? (42bits) #13

Closed
osexpert opened this issue Aug 23, 2024 · 4 comments
Closed

Question: why so large counter? (42bits) #13

osexpert opened this issue Aug 23, 2024 · 4 comments

Comments

@osexpert
Copy link

osexpert commented Aug 23, 2024

Hi. I was making my own and was thinking about a randomly seeded counter of 12, 20 or 26 bits, with +1 or a random increment. Even 12 bits with +1 would rarely rollover for me, only when the random seed is close to the max (a problem with any size random seeded counter).
Then I saw this using surprisingly large 42bits counter with +1. My thinking is, that a large counter will have many bits that is unchanged when doing +1, and this will decrease the randomness /entropy?
So my thinking is, have a counter that is fully "utilized" (changes as many bits as possible) compared to how many Uuids can be made in the same ms.
But probably there is a reason here that I am not seeing, I am guessing it can have to do with supporting time going back?
Thanks.

@osexpert osexpert changed the title Question: why so large counter? (48bits) Question: why so large counter? (42bits) Aug 23, 2024
@LiosK
Copy link
Owner

LiosK commented Aug 23, 2024

Hi! That's a very good question. This question relates to unguessability and uniqueness properties.

As for unguessability, the 42-bit counter obviously sacrifices this property, because 96 bits of a UUID can be derived from the immediately preceding value. I didn't put priority on unguessability because it is hard by design of UUIDv7 to ensure unguessability given the limited bit space available.

As for uniqueness, let's compare a 42-bit randomly seeded counter and a 42-bit random number field. In summary, when you generate N IDs for each scenario:

  1. The expected number of duplicates is identical because you get N items in the same 42-bit space; but,
  2. The probability of having at least one duplicate is much lower with the randomly seeded counter.
  3. 1 and 2 jointly imply that if you get a duplicate with the randomly seeded counter, then you are very likely to get multiple duplicates.

With the simple random number field, the probability of having at least one duplicate is known as the birthday probability and can be calculated using the formula in the article.

With the randomly seeded counter, you will pick an initial counter value randomly for each batch of consecutive IDs, and as long as the initial values are sufficiently distant from each other, you will never get a duplicate. Such a probability can be calculated using the near-match variant also in the article.

Plug parameters in the formulae and you will find the conclusion in the above summary. You can read a similar discussion in this blog post, too.

For the general UUID use cases, I believe the probability of having at least one duplicate is very important property. That's the reason why I selected the 42-bit counter.

@osexpert
Copy link
Author

Thanks. I see that a rust implementation also use the same/similar format as you, possibly based on your design: https://kodraus.github.io/rust/2024/06/24/uuid-v7-counters.html. How do you think this design compares to eg. Ulid's that uses all 80bits as counter? Better, worse, equal?

@LiosK
Copy link
Owner

LiosK commented Aug 24, 2024

I believe I improved ULID's 80-bit counter by ensuring a certain level of unguessability with the trailing 32-bit entropy. 32 bits won't be strong enough against offline brute-force attacks but might be practically sufficient against attacks over the network.

The counter length is smaller than ULID's, but 42 bits are already overkill for realistic use cases. Plus, the smart rollover handling using the timestamp field as a spare counter has made counter overflow a negligible event.

From the implementation standpoint, a 42-bit counter is much easier to implement because it fits in the 64-bit int/float range, than a 80-bit counter which needs BigInt arithmetics..

@osexpert
Copy link
Author

I agree.
I am currently, in my own, using a 26bit counter and now I do not wonder anymore if I should reduce it:-) Thank you again👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants