A key/value store backed by LSM Trees and SSTables.
This project is a work in progress 🚧 and is being developed primarily to suit the author's personal study goals 🎓. But if you find something in here that helps you learn, that's great too!
All writes are first written to a commit log, protecting against crashes.
Newly written values are stored in a memtable backed by :gb_trees.
We define a binary SSTable format.
We use phoenix to expose a REST API (PUT, GET, DEL) for creating, reading, updating, and deleting (for now) string resources using Content-Type: application/json
. Support for application/octet-stream is forthcoming.
To start the Phoenix server in dev mode:
- Install dependencies with
mix deps.get
- Install Node.js dependencies with
npm install
inside theassets
directory - Start Phoenix endpoint with
mix phx.server
Now you can visit localhost:4000
from your browser.
To run in prod mode with some optimizations, you first need to follow the Phoenix deployment instructions to generate a secret, etc. Then:
PORT=4001 MIX_ENV=prod mix phx.server
Using simple JSON string:
curl -X PUT -H 'Content-Type: application/json' -d '"meh meh"' http://localhost:4000/api/values/1
Using form data:
curl -X PUT -d value='meh meh' http://localhost:4000/api/values/1
Returns Content-Type: application/json
.
curl http://localhost:4000/api/values/1
curl -X DELETE http://localhost:4000/api/values/1
A Sorted String Table contains zero or more gzipped key/value chunks.
A gzip key/value chunk follows this binary specification:
- Four bytes: length of the gzipped chunk
- Variable length: gzipped chunk of key/value pairs, with tombstones.
Once unzipped, each key/value chunk contains zero or more key/value records. Each record describes its own length. Some keys may point to tombstones.
- Four bytes: Length of key
- Four bytes: Length of value
- Variable length: Raw key, not escaped
- Variable length: Raw value, not escaped
- Four bytes: Length of key in bytes
- Four bytes:
2^32 - 1
to indicate tombstone - Variable length: Raw key, not escaped
One can imagine a simple, uncompressed binary representation of keys to values:
- hey: now
- no: yes
- one: TOMBSTONE
- three: four
In practice you won't ever see this sequence on disk, because the records will be gzipped.
On app startup, all outstanding commit logs are replayed and flushed from memtable to SSTable, then deleted. During the course of normal operation, a new commit log is created every time the memtable is flushed, and new writes are routed to that file. The old commit log can be removed right away.
Read the design ticket for more information.
Kleppmann: Designing Data-Intensive Applications gives a fantastic summary of local-node operation for data stores using SSTable, followed by detail on strategies for replication and partitioning. Check it out!
I sourced https://github.com/tamas-soos/expoll/blob/master/lib/ex_poll_web/views/poll_view.ex as a preliminary example for working with Phoenix.
You should enable multi-time warp mode during development in the REPL.
ELIXIR_ERL_OPTIONS="+C multi_time_warp" iex -S mix
You can follow the CLI instructions for ExDoc:
ex_doc "AugustDB" "0.0.0" "_build/dev/lib/august_db/ebin"
-m "AugustDbWeb.ValueController" -u "https://github.com/Terkwood/AugustDB"
First implement a local key-value store that uses a memtable, SSTables, and a commit log. Then implement a replicating data store which syncs via gossip. Then implement partitioning using vnodes. See the issue tracker.