-
-
Notifications
You must be signed in to change notification settings - Fork 97
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use konserve synchronous IO. Support native-image compilation.
- Loading branch information
Showing
13 changed files
with
453 additions
and
75 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#!/usr/bin/env bash | ||
|
||
function on_path { | ||
builtin type -P "$1" &> /dev/null | ||
} | ||
|
||
GRAAL_NOT_ON_PATH="PATH does not contain native-image. Make sure to add your GraalVM to it." | ||
on_path native-image && clojure -M:native-image --no-fallback --report-unsupported-elements-at-runtime || echo $GRAAL_NOT_ON_PATH |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#!/bin/bash | ||
|
||
TMPSTORE=/tmp/dh-test-store | ||
|
||
datahike benchmark db:testconfig.edn 0 100000 10000 | ||
datahike transact db:testconfig.edn '[[:db/add -1 :name "Judea"]]' | ||
QUERY_OUT=`datahike query '[:find (count ?e) . :where [?e :name _]]' db:testconfig.edn` | ||
|
||
if [ $QUERY_OUT -eq 100001 ] | ||
then | ||
echo "Test successful." | ||
else | ||
echo "Query did not return correct value." | ||
fi | ||
|
||
rm -rf $TMPSTORE |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{:store {:backend :file | ||
:path "/tmp/dh-test-store" | ||
:config {:in-place? true}} | ||
:keep-history? true | ||
:schema-flexibility :read} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
# Command line interface | ||
|
||
*This is work in progress and subject to change.* | ||
|
||
We provide the `datahike` native executable to access Datahike databases from | ||
the command line. | ||
|
||
|
||
# Example usage | ||
|
||
First you need to download the precompiled binary, or build it yourself, and put | ||
it on your executable path. | ||
|
||
To access a database you need to provide the usual configuration for Datahike. | ||
Put this into a file `myconfig.edn`. | ||
|
||
```clojure | ||
{:store {:backend :file | ||
:path "/home/USERNAME/dh-shared-db" | ||
:config {:in-place? true}} | ||
:keep-history? true | ||
:schema-flexibility :read} | ||
``` | ||
|
||
Now you can invoke some of our core API functions on the database. Let us add a | ||
fact to the database (be careful to use single ' if you do not want your shell | ||
to substitute parts of your Datalog ;) ): | ||
|
||
```bash | ||
$ datahike transact db:myconfig.edn '[[:db/add -1 :name "Linus"]]' | ||
``` | ||
|
||
And retrieve it: | ||
|
||
```bash | ||
$ datahike query '[:find ?n . :where [?e :name ?n]]' db:myconfig.edn | ||
"Linus" # prints the name | ||
``` | ||
|
||
By prefixing the path with `db:` to the query engine you can pass multiple db | ||
configuration files and join over arbitrary many databases. Everything else is | ||
read in as `edn` and passed to the query engine as well. | ||
|
||
|
||
Provided the filestore is configured with `{:in-place? true}` you can even write | ||
to the same database without a dedicated daemon from different shells, | ||
|
||
|
||
```bash | ||
$ datahike benchmark db:myconfig.edn 0 50000 100 | ||
"Elapsed time: 116335.589411 msecs" | ||
``` | ||
|
||
Here we use a provided benchmark helper which transacts facts of the form `[eid | ||
:name (random-team-member)]` for `eid=0,...,50000` into the store. `100` denotes | ||
the batch size for each transaction, so here we chunk the 50000 facts into 500 | ||
transactions. | ||
|
||
In a second shell you can now simultaneously add facts in a different range | ||
|
||
```bash | ||
$ datahike benchmark db:myconfig.edn 50000 100000 100 | ||
``` | ||
|
||
|
||
To check that everything has been added and no write operations have overwritten | ||
each other. | ||
|
||
|
||
```bash | ||
$ datahike query '[:find (count ?e) . :in $ :where [?e :name ?n]]' db:myconfig.edn | ||
100000 # check :) | ||
``` | ||
|
||
# Memory model | ||
|
||
The persistent semantics of Datahike work more like `git` and less like similar | ||
mutable databases such as SQLite or Datalevin. In particular you can always read | ||
and retain snapshots (copies) of the database for free, no matter what else is | ||
happening in the system. The current version is tested with memory and file | ||
storage, but hopefully many other backends will also work with the | ||
`native-image`. | ||
|
||
In principle this shared memory access should even work while having a JVM | ||
server, e.g. datahike-server, serving the same database. Note that all reads can | ||
happen in parallel, only the writers experience congestion around exclusive file | ||
locks here. This access pattern does not provide highest throughput, but is | ||
extremely flexible and easy to start with. | ||
|
||
## Forking and pulling | ||
|
||
Forking is easy, it is enough to copy the folder of the store (even if the | ||
database is currently being written to). The only thing you need to take care of | ||
is to copy the DB root first and place it into the target directory last, it is | ||
the file `0594e3b6-9635-5c99-8142-412accf3023b.ksv`. Then you can use e.g. | ||
`rsync` (or `git`) to copy all other (immutable) files into your new folder. In | ||
the end you copy the root file in there as well, making sure that all files it | ||
is referencing are reachable. Note that this will ensure that you only copy new | ||
data each time. | ||
|
||
## Merging | ||
|
||
Now here comes the cool part. You do not need anything more for merging than | ||
Datalog itself. You can use a query like this to extract all new facts that are | ||
in `db1` but not in `db2` like this: | ||
|
||
```bash | ||
datahike query '[:find ?e ?a ?v ?t :in $ $2 :where [$ ?e ?a ?v ?t] (not [$2 ?e ?a ?v ?t])]' db:config1.edn db:config2.edn | ||
``` | ||
|
||
Since we cannot update transaction metadata, we should filter out | ||
`:db/txInstant`s. We can also use a trick to add `:db/add` to each element in | ||
the results, yielding valid transactions that we can then feed into `db2`. | ||
|
||
|
||
```bash | ||
datahike query '[:find ?db-add ?e ?a ?v ?t :in $ $2 ?db-add :where [$ ?e ?a ?v ?t] [(not= :db/txInstant ?a)] (not [$2 ?e ?a ?v ?t])]' db:config1.edn db:config2.edn ":db/add" | transact db:config2.edn | ||
``` | ||
|
||
Note that this very simple strategy assumes that the entity ids that have been | ||
added to `db1` do not overlap with potentially new ones added to `db2`. You can | ||
encode conflict resolution strategies and id mappings with Datalog as well and | ||
we are exploring several such strategies at the moment. This strategy is fairly | ||
universal, as [CRDTs can be expressed in pure | ||
Datalog](https://speakerdeck.com/ept/data-structures-as-queries-expressing-crdts-using-datalog). | ||
While it is not the most efficient way to merge, we plan to provide fast paths | ||
for common patterns in Datalog. Feel free to contact us if you are interested in | ||
complex merging strategies or have related cool ideas. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.