Skip to content

Commit

Permalink
Merge pull request #2 from lowjiajin/jj/proof-of-concept
Browse files Browse the repository at this point in the history
Snowfall GUID generator
  • Loading branch information
lowjiajin authored Jul 3, 2020
2 parents c928e26 + a7bfb32 commit 10d7b07
Show file tree
Hide file tree
Showing 12 changed files with 639 additions and 37 deletions.
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# IDE files
.idea

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down Expand Up @@ -26,11 +29,12 @@ share/python-wheels/
.installed.cfg
*.egg
MANIFEST
.DS_Store

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*._manifest
*.spec

# Installer logs
Expand Down
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
include README.md
include requirements.txt
83 changes: 47 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,86 +7,97 @@ Snowfall is a lightweight 64-bit integer based GUID generator inspired by the Tw
## GUID Specification
A Snowfall GUID consists of:
```
1 bit reserved
40 bits for the ms since a custom epoch time
12 bits for a looping counter
11 bits for a generator id
41 bits for the ms since a custom epoch time
11 bits for a looping counter
12 bits for a generator id
```

As such, Snowfall returns unique GUIDs for as long as:
1. The generator id is within `[0-2048)`.
2. No more than `4096` GUIDs are generated within one ms.
1. The generator id is within `[0, 4096)`.
2. No more than `2048` GUIDs are generated within one ms.
3. The lifetime of the system is no more than `2^41ms` (~70 years) from the epoch time set.

## Developer Guide
### Installation
A complete installation of Snowfall with all [`id_assigners`](#enforcing-unique-generator_ids) and their dependencies.
A complete installation of Snowfall with all [`generator_syncers`](#enforcing-unique-generator_ids) and their dependencies.
```
pip install snowfall
```

### Quickstart
To start generating IDs, simply create a `Snowfall` instance with a `generator_id`.
To start generating IDs, simply create a schema group and start a `Snowfall`.
```
from snowfall import Snowfall
from snowfall.generator_syncers import SimpleSyncer
id_generator = Snowfall(
generator_id=0
)
SimpleSyncer.create_schema_group()
id_generator = Snowfall()
```
Successively calling `get_id()` will return valid GUIDs.
Successively calling `get_guid()` will return valid GUIDs.

> :warning: **Possible throttling**: Snowfall throttles the issuing speed to ensure that no more than 4096 GUIDs are generated per ms.
> :warning: **Possible throttling**: Snowfall throttles the issuing speed to ensure that no more than 2048 GUIDs are generated per ms.
```
id_generator.get_id()
>>> 4611686027683621110
id_generator.get_id()
>>> 6385725700183638596
id_generator.get_guid()
>>> 133494887688437760
id_generator.get_guid()
>>> 133494896085434368
```

### Enforcing unique `generator_ids`
The global uniqueness of Snowfall's IDs only hold if each Snowfall instance has a unique `generator_id`. Ideally, we want to throw an exception when an instance is initialized with a `generator_id` that is already in use.
The global uniqueness of Snowfall's IDs only hold if each Snowfall instance reserves a unique [`generator_id`](#guid-specification). Ideally, we want to throw an exception when an instance is initialized with a `generator_id` that is already in use.

The `id_assigners` module contains classes that enforce this constraint by automating the assignment of `generator_ids` to Snowfall instances, using a shared manifest of available and reserved `generator_ids`. If all available `generator_ids` are reserved by active Snowfall instances, further attempts at instantiation would result in an `OverflowError`.
The `generator_syncers` module contains classes that enforce this constraint by automating the reservation and release of `generator_ids` by Snowfall instances, using a shared manifest. If all available `generator_ids` are reserved by active Snowfall instances, further attempts at instantiation would result in an `OverflowError`.

#### For single-process projects
For single-process projects, we provide a `SimpleIDAssigner` that records the manifest as a Python data structure. All Snowfall instances need to share the same SimpleAssigner instance.
For single-process projects, we provide a `SimpleSyncer` that records the manifest as a Python data structure. First, create a new global schema group, and then bind the Snowfall instance to it.

All `Snowfall` instances that share the same schema group will not create duplicate GUIDs.
```
from datetime import datetime
from snowfall import Snowfall
from snowfall.id_assigners import SimpleAssigner
from snowfall.generator_syncers import SimpleSyncer
id_assigner = SimpleAssigner(
liveliness_probe_ms=5000
epoch_start=datetime(2020, 1, 1)
SimpleSyncer.create_schema_group(
schema_group_name="example_schema_group"
)
id_generator = Snowfall(=
id_assigner=id_assigner
id_generator = Snowfall(
generator_syncer_type=SimpleSyncer,
schema_group_name="example_schema_group"
)
```

You can also customize the liveliness probe frequency and the epoch start as follows:

```
SimpleSyncer.create_schema_group(
schema_group_name="example_schema_group"
liveliness_probe_s=10
epoch_start_date=datetime(2020, 1, 1)
)
```

#### For multi-process or distributed projects
For multi-process, multi-container projects, we need to persist the `generator_id` assignment and liveliness information to a database shared by all containers writing to the same schema. For this, we provide a `DatabaseAssigner` that supports any SQLAlchemy-compatible database.
For multi-process, multi-container projects, we need to persist the `generator_id` assignment and liveliness information to a database shared by all containers writing to the same schema. For this, we provide a `DatabaseSyncer` that supports any SQLAlchemy-compatible database.

> :warning: **Instantiating assigners**: All database assigners wih the same `engine_url` need to share the same `epoch_start` Otherwise, a ValueError is thrown.
> :warning: **Instantiating syncers**: All database syncers with the same `engine_url` need to share the same `epoch_start` Otherwise, a ValueError is thrown.
> :warning: **Permissions required**: The `DatabaseAssigner` creates new tables `snowfall_properties` and `snowfall_manifest`, and performs CRUD operations on them.
> :warning: **Permissions required**: The `DatabaseSyncer` creates new tables `snowfall_{schema_group_name}_properties` and `snowfall_{schema_group_name}_manifest`, and performs CRUD operations on them.
```
from datetime import datetime
from snowfall import Snowfall
from snowfall.id_assigners import DatabaseAssigner
from snowfall.generator_syncers import DatabaseSyncer
id_assigner = DatabaseAssigner(
engine_url="postgresql://user:pass@host:port/db"
liveliness_probe_ms=5000,
epoch_start=datetime(2020, 1, 1)
DatabaseSyncer.create_schema_group(
schema_group_name="example_schema_group"
liveliness_probe_s=10,
epoch_start_date=datetime(2020, 1, 1)
)
id_generator = Snowfall(=
id_assigner=id_assigner
generator_syncer_type=DatabaseSyncer,
engine_url="postgresql://user:pass@host:port/db"
)
```

Expand Down
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
APScheduler==3.6.3
SQLAlchemy==1.3.18
numpy==1.19.0
31 changes: 31 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import pathlib
from setuptools import setup, find_packages


HERE = pathlib.Path(__file__).parent
with open('requirements.txt') as f:
REQUIREMENTS = f.read().strip().split('\n')

setup(
name="snowfall",
version="1.0.0",
description="Bigint-based distributed GUID generator",
long_description=(HERE / "README.md").read_text(),
long_description_content_type="text/markdown",
url="https://github.com/lowjiajin/snowfall",
author="Low Jia Jin",
author_email="pixelrife@hotmail.com",
license="MIT",
classifiers=[
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3.7",
],
packages=find_packages(),
include_package_data=True,
install_requires=REQUIREMENTS,
entry_points={
"console_scripts": [
"create_db_schema_group=src.generator_syncers.database_syncers:create_schema_group",
]
},
)
1 change: 1 addition & 0 deletions snowfall/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from snowfall.main import Snowfall
3 changes: 3 additions & 0 deletions snowfall/generator_syncers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from snowfall.generator_syncers.abstracts import BaseSyncer
from snowfall.generator_syncers.database_syncer import DatabaseSyncer
from snowfall.generator_syncers.simple_syncer import SimpleSyncer
88 changes: 88 additions & 0 deletions snowfall/generator_syncers/abstracts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
from abc import ABC, abstractmethod
from apscheduler.schedulers.background import BackgroundScheduler

from snowfall.utils import get_current_timestamp_ms


class BaseSyncer(ABC):

PROBE_MISSES_TO_RELEASE = 2
MAX_GENERATOR_ID = 2 ** 12 - 1

def __init__(self):
"""
All syncers have a background task which updates the liveliness of its Snowfall instance in the manifest
at periodic intervals
"""
self.scheduler = BackgroundScheduler()
self.scheduler.add_job(
func=self.update_liveliness_job,
trigger="interval",
seconds=self.liveliness_probe_s,
)
self.scheduler.start()

self._last_alive_ms = 0
self._generator_id = self._claim_generator_id()

def is_alive(
self,
current_timestamp_ms: int
):
"""
The syncer, and by extension its Snowfall instance, is alive iff its generator id is still reserved.
"""
ms_since_last_updated = current_timestamp_ms - self._last_alive_ms
if ms_since_last_updated <= self.ms_to_release_generator_id:
return True
else:
return False

def update_liveliness_job(self):
self._set_liveliness(
current_timestamp_ms=get_current_timestamp_ms(),
generator_id=self._generator_id
)

@property
@abstractmethod
def liveliness_probe_s(self) -> int:
raise NotImplementedError

@property
@abstractmethod
def ms_to_release_generator_id(self) -> int:
raise NotImplementedError

@property
@abstractmethod
def generator_id(self) -> int:
raise NotImplementedError

@property
@abstractmethod
def last_alive_ms(self) -> int:
raise NotImplementedError

@property
@abstractmethod
def epoch_start_ms(self) -> int:
raise NotImplementedError

@classmethod
@abstractmethod
def create_schema_group(cls) -> None:
raise NotImplementedError

@abstractmethod
def _claim_generator_id(self) -> int:
raise NotImplementedError

@abstractmethod
def _set_liveliness(
self,
current_timestamp_ms: int,
generator_id: int

) -> None:
raise NotImplementedError
Loading

0 comments on commit 10d7b07

Please sign in to comment.