-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: distributed reads and transactor for datahike-server #332
Conversation
datahike-client will not work either with datahike when compiled with JDK<11 because I am using the JDK built-in http-client in the datahike-client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Do you intend to add the streaming within this PR?
deps.edn
Outdated
@@ -8,7 +8,9 @@ | |||
io.replikativ/superv.async {:mvn/version "0.2.11"} | |||
io.lambdaforge/datalog-parser {:mvn/version "0.1.8"} | |||
io.replikativ/zufall {:mvn/version "0.1.0"} | |||
junit/junit {:mvn/version "4.13.1"}} | |||
junit/junit {:mvn/version "4.13.1"} | |||
io.replikativ/datahike-client {:git/url "https://github.com/replikativ/datahike-client.git" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just made a 0.1.0-SNAPSHOT
@TimoKramer I've submitted a pull req to make the client work on JDK8 |
@alekcz Does http-kit also work with Graal's native-images? We are currently heading in this direction and I think it will be particularly beneficial if our HTTP client is also embeddable into native images. |
babashka supports http-kit so this seems to work. I think it is a good choice because with borkdude supporting it it is obviously very well alive:) |
@TimoKramer I think Datahike should be usable without depending on libraries for network IO. Therefore I think it would not be bad to move the code of the actual |
Yes it does. It's actually a built-in library in babashka. now too. |
e981e72
to
36fbd2b
Compare
adce05f
to
36fbd2b
Compare
This PR also covers #439 btw. You can now pass in |
96715ae
to
54e8a9b
Compare
Throws properly in case where database has been deleted and operations fail.
36fbd2b
to
51474b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks well written in total but it lacks testing. With such a large new use case we should have a decent amount of tests covering the new feature.
;; You may control this feature using the `:keep-history?` attribute: | ||
(create-database {:store {:backend :mem :id \"example\"} :keep-history? false}) | ||
|
||
;; Initial data after creation may be added using the `:initial-tx` attribute, which in this example adds a schema: | ||
(create-database {:store {:backend :mem :id \"example\"} :initial-tx [{:db/ident :name :db/valueType :db.type/string :db.cardinality/one}]})"} | ||
|
||
create-database | ||
dc/create-database) | ||
(fn [& args] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you use an anonymous function instead of just declaring the whole symbol with defn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this namespace overall is written that way. I can change that.
@@ -158,17 +165,17 @@ | |||
:initial-tx (:datahike-intial-tx env) | |||
:keep-history? (bool-from-env :datahike-keep-history *default-keep-history?*) | |||
:attribute-refs? (bool-from-env :datahike-attribute-refs *default-attribute-refs?*) | |||
:name (:datahike-name env (or *default-db-name* (z/rand-german-mammal))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When did we decide to remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to have something like description instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote to you about in January or so. The problem is that I cannot de-duplicate the configuration if it is randomized. If zufall
could be seeded with a random seed then it would be possible. I did not find dependencies on the :name
feature so I decided to remove it. Note that I added store-identity
instead, which fully identifies each database through the store it is based on. It is human readable and informative, but not as fun as zufall
names.
|
||
(defn stored-db? [obj] | ||
;; TODO use proper schema to match? | ||
(let [keys-to-check [:eavt-key :aevt-key :avet-key :schema :rschema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Probably we should use a proper schema. Also should there be a check for the historical indexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally runtime "type" checks in Clojure should be avoided, but this is motivated because we load something external and we want to be sure that it is not broken in the storing/reading process. This test is fairly cheap while a full schema check would be more expensive and I actually think it is good enough for this reason. But I can check more properties of the loaded map if you think this is helpful. The historical indexes are not checked because they are optional and not all databases have them.
:store store))) | ||
|
||
(defn branch-heads-as-commits [store parents] | ||
(set (doall (for [p parents] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting would be nicer with ->
.
@@ -0,0 +1,239 @@ | |||
(ns datahike.writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do I find tests for this namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This namespace is just factored out from the prior connection
namespace. It is tested with all the existing tests, I just tried to factorize the code base a bit better as connection
was conflating too many things.
@kordano I agree, but note that the code does not change the way the connection works for existing use cases and hence is tested for them with the current code base. Accessing a remote connection should be tested with an integration test including |
Yeah, it's fine to add them later on. So go ahead an release it. My other comments where primarily superficial and about formatting. |
This PR implements distributed read operations by redirecting the access to the connection to the underlying konserve storage to update the DB value (closes #322). It requires one roundtrip to fetch the root under
:db
and is a polling mechanism that is efficient if the reader is on average not accessing the database more often than it is being written to it and if the latency of the store access is not too high for the application. For low latency the transactor interface now exposes whether it is actively streaming, as is the default local transactor that we had so far (basically directly operating on the connection atom).This PR also provides an implementation of the transactor interface for datahike-server, which provides the single writer to the underlying store and is not able to stream yet. Provided datahike server is running with the same datahike version and configured with the example configuration, the following configuration works now for an arbitrary number of Datahike peers (readers):
It is not clear to me yet whether we want to keep this transactor in this repository or move it into a separate one because we now have an additional dependency on
datahike-client
that we can avoid. Probably it is better to pull it out. The benefit of including it in Datahike is that Datahike will then always work with datahike-server instances out of the box.