Snowfall is a lightweight 64-bit integer based GUID generator inspired by the Twitter-Snowflake algorithm. Compared to traditional 128-bit UUIDs, Snowfall generates IDs which:
- Are k-sortable by creation time.
- Have superior indexing characteristics on most DBMSes.
- Take up half the space.
A Snowfall GUID consists of:
41 bits for the ms since a custom epoch time
11 bits for a looping counter
12 bits for a generator id
As such, Snowfall returns unique GUIDs for as long as:
- The generator id is within
[0, 4096)
. - No more than
2048
GUIDs are generated within one ms per generator id. - The lifetime of the system is no more than
2^41ms
(~70 years) from the epoch time set.
- Snowfall instance: The GUID generator, reserves a unique
generator_id
. - Syncer instance: Associated with one generator. Ensures that no other generator in the schema group is using its
generator_id
. - Schema group: A grouping of generators that always produce globally unique IDs.
A minimal installation of Snowfall. This supports both the SimpleSyncer
, and the DatabaseSyncer
when used with SQLite.
pip install snowfall
However, because the DatabaseSyncer
uses SQLAlchemy to connect to the database, there are optional dependencies depending on the DBMS used. For instance, MySQL requires the MySQL-python
package, while PostgreSQL requires psycopg2
. More information can be found in the SQLAlchemy docs.
For convenience, we have included the following pip extras:
pip install snowfall[postgres]
pip install snowfall[mysql]
pip install snowfall[oracle]
To start generating IDs, simply create a schema group and start a Snowfall
.
from snowfall import Snowfall
from snowfall.generator_syncers import SimpleSyncer
SimpleSyncer.create_schema_group()
id_generator = Snowfall()
Successively calling get_guid()
will return valid GUIDs.
⚠️ Possible throttling: Snowfall throttles the issuing speed to ensure that no more than 2048 GUIDs are generated per ms.
id_generator.get_guid()
>>> 133494887688437760
id_generator.get_guid()
>>> 133494896085434368
The global uniqueness of Snowfall's GUIDs only hold if each Snowfall instance reserves a unique generator_id
. Ideally, we want to automate the reservation of generator_ids
by Snowfall instances, and their release when not in use.
The generator_syncers
module contains classes that enforce this constraint, by updating a shared manifest. If all available generator_ids
are reserved by active Snowfall instances, further attempts at instantiation would result in an OverflowError
.
While most usages of Snowfall
apply to setups where GUIDs are produced concurrently by multiple machines and/or processes, we nevertheless support a non-networked solution for single-process use cases. E.g. test environments, local prototyping, etc.
The SimpleSyncer
records the manifest in-memory, persistence to disk is not required for uniqueness. To set it up, create a new global schema group, and then bind the Snowfall instance to it.
from snowfall import Snowfall
from snowfall.generator_syncers import SimpleSyncer
SimpleSyncer.create_schema_group(
schema_group_name="example_schema_group"
)
id_generator = Snowfall(
generator_syncer_type=SimpleSyncer,
schema_group_name="example_schema_group"
)
You can also customize the liveliness probe frequency and the epoch start as follows:
SimpleSyncer.create_schema_group(
schema_group_name="example_schema_group"
liveliness_probe_s=10
epoch_start_date=datetime(2020, 1, 1)
)
When we have multiple Snowfall
instances generating concurrently across multiple processes or machines, we need to persist the generator_id
assignment and liveliness information to a database shared by all containers writing to the same schema. For this, we provide a DatabaseSyncer
that supports any SQLAlchemy-compatible database.
⚠️ Permissions required: TheDatabaseSyncer
creates new tablessnowfall_{schema_group_name}_properties
andsnowfall_{schema_group_name}_manifest
, and performs CRUD operations on them.
First, create the schema group. Because this operation creates the relevant tables in the database of your choice, it should only be done once. You can also access this function via the terminal as create_db_schema_group
.
from snowfall.generator_syncers import DatabaseSyncer
DatabaseSyncer.create_schema_group(
schema_group_name="example_schema_group",
engine_url="dbms://user:pass@host:port/db"
)
Next, just start a Snowfall
anywhere you want, and point it to the schema group you created.
from snowfall import Snowfall
id_generator = Snowfall(=
generator_syncer_type=DatabaseSyncer,
schema_group_name="example_schema_group",
engine_url="dbms://user:pass@host:port/db"
)
The create_schema_group
method also supports other keyword arguments. Shown here are the defaults:
DatabaseSyncer.create_schema_group(
liveliness_probe_s = 5,
epoch_start_date = datetime(2020, 1, 1),
max_claim_retries = 3,
min_ms_between_claim_retries = 100,
max_ms_between_claim_retries = 500,
engine_url = "sqlite:////tmp/test.db"
)
Note that the default behaviour for the engine_url
is to create a sqlite database in a temporary directory. We recommend switching this out for a client-server DBMS of your choice.
A generator_id
is reserved for as long as the Snowfall instance is capable of transmitting liveliness information to the generator manifest, and released when the last liveliness update was more than a set amount of time ago. This time is set with liveliness_probe_ms
.
When a generator_id
is released, it is not struck from the manifest. Instead, new Snowfall instances are able to reserve it. This is to eliminate the need for a separate client to run regular cleanup jobs on the manifest, and keeps Snowfall as lightweight as possible.
We are looking to:
- Add support for generators that implement the Snowfall GUID spec in other languages.
- Improve the speed of Snowfall by converting the codebase to Cython.
- Declare extras for the
pip install
process, to reduce unnecessary dependencies.
Please contact @lowjiajin for more details.