Skip to content

tetsuo/hypervault

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hypervault

pgsql connection manager for scalability freaks.

pip install hv

This is the implementation of the pattern described by Instagram in "Sharding & IDs at Instagram" article.

Besides for that, it wraps over psycopg2 with custom connection pooling support and stores dicts in hstore k/v pairs which may be indexed by PostgreSQL.

api

hv.entity.Key(id)

An instance of the Key represents a unique key (64-bits long) for an entity and has the following attributes:

created returns the UTC datetime corresponding to the first 41-bits of the numeric id.

shard_id holds a 13-bits integer and represents the logical shard.

added_id is the remaining 10-bits and represents an auto-incrementing sequence, modulus 1024. This means we can generate 1024 IDs, per shard, per millisecond.

hv.datastore.Datastore(connections, pool_max, pool_block_timeout, logger)

connections holds an array of dicts which are being passed to psycopg2 respectively. However those dicts should also contain a special shards value which adds meaning to all that fuss going around.

This example shows the bare minimum you need to create a Datastore instance:

connections = [
  dict(shards='1-9', host='192.168.2.23', port='5432', user='x', password='x', database='x'),
  dict(shards='9-17', host='192.168.2.24', port='5432', user='x', password='x', database='x')
]
db = Datastore(connections)

In this case, we assume PostgreSQL running on 192.168.2.23 contains shards (schemas) starting from 1 to 9 (9 not included) and on 192.168.2.24 we have shards from 9 to 17.

pool_max is the maximum number of psycopg2 connections that are going to be kept alive for every connection we have passed. (default: 10)

pool_block_timeout is the maximum number of seconds to wait for getting a connection from pool before the request is dropped. (default: 5)

logger should hold a Logger object if you want to use LoggingConnection. By default every connection is an instance of DictConnection.

A Datastore instance has the following methods:

get_connection(shard_id)

Returns a psycopg2 connection for the given shard_id.

Beware that this connection should be sent back into the pool when you are finished, or otherwise you know- universe will collapse and Trinity will die :(

put_connection(connection)

Sends connection back into the pool where it belonged.

cursor(shard_id)

Returns a context manager delivering a connection for the given shard_id.

This is a convenience method that saves you from forgetting to call put_connection.

Example:

with db.cursor(5) as cur:
  cur.execute('SELECT version()')
  ver = cur.fetchone()

put(shard_id, kind, **kwargs)

Writes data to the specified shard, where kind is an integer which is not stored within hstore field and used for differentiating between entity types.

Returns a hv.entity.Key.

Example:

data = dict(beep='boop')
key = db.put(12, 1, **data)

get(key)

Fetches the data with the given key.

key must be of type hv.entity.Key.

Example:

key = Key(307821103844175873)
res = db.get(key)

disconnect()