feat: distributed reads and transactor for datahike-server #332

whilo · 2021-05-21T04:25:43Z

This PR implements distributed read operations by redirecting the access to the connection to the underlying konserve storage to update the DB value (closes #322). It requires one roundtrip to fetch the root under :db and is a polling mechanism that is efficient if the reader is on average not accessing the database more often than it is being written to it and if the latency of the store access is not too high for the application. For low latency the transactor interface now exposes whether it is actively streaming, as is the default local transactor that we had so far (basically directly operating on the connection atom).

This PR also provides an implementation of the transactor interface for datahike-server, which provides the single writer to the underlying store and is not able to stream yet. Provided datahike server is running with the same datahike version and configured with the example configuration, the following configuration works now for an arbitrary number of Datahike peers (readers):

{:store    {:backend :file :path "/tmp/dh-file"}
            :keep-history? true ;; true
            :schema-flexibility :read
            :writer {:backend :datahike-server
                     :client-config  {:timeout 300
                                      :endpoint "http://localhost:3333"
                                    ;:token "securerandompassword"
                                      :db-name "users"}}
            }

It is not clear to me yet whether we want to keep this transactor in this repository or move it into a separate one because we now have an additional dependency on datahike-client that we can avoid. Probably it is better to pull it out. The benefit of including it in Datahike is that Datahike will then always work with datahike-server instances out of the box.

TimoKramer · 2021-05-23T15:36:06Z

datahike-client will not work either with datahike when compiled with JDK<11 because I am using the JDK built-in http-client in the datahike-client

TimoKramer

Awesome! Do you intend to add the streaming within this PR?

TimoKramer · 2021-05-23T16:37:17Z

deps.edn

@@ -8,7 +8,9 @@
        io.replikativ/superv.async                  {:mvn/version "0.2.11"}
        io.lambdaforge/datalog-parser               {:mvn/version "0.1.8"}
        io.replikativ/zufall                        {:mvn/version "0.1.0"}
-        junit/junit                                 {:mvn/version "4.13.1"}}
+        junit/junit                                 {:mvn/version "4.13.1"}
+        io.replikativ/datahike-client               {:git/url "https://github.com/replikativ/datahike-client.git"


Just made a 0.1.0-SNAPSHOT

alekcz · 2021-05-26T14:31:01Z

datahike-client will not work either with datahike when compiled with JDK<11 because I am using the JDK built-in http-client in the datahike-client

@TimoKramer I've submitted a pull req to make the client work on JDK8

whilo · 2021-05-27T07:25:50Z

@alekcz Does http-kit also work with Graal's native-images? We are currently heading in this direction and I think it will be particularly beneficial if our HTTP client is also embeddable into native images.

TimoKramer · 2021-05-27T07:29:37Z

babashka supports http-kit so this seems to work. I think it is a good choice because with borkdude supporting it it is obviously very well alive:)

whilo · 2021-05-27T07:32:08Z

@TimoKramer I think Datahike should be usable without depending on libraries for network IO. Therefore I think it would not be bad to move the code of the actual DatahikeServerTransactor and with it the dependencies on a HTTP client out of this PR into a separate downstream repository that can be included into projects as needed. I would do the same for the native-image cli interface. Nonetheless it could potentially include the DatahikeServerTransactor.

alekcz · 2021-05-27T07:46:28Z

@alekcz Does http-kit also work with Graal's native-images? We are currently heading in this direction and I think it will be particularly beneficial if our HTTP client is also embeddable into native images.

Yes it does. It's actually a built-in library in babashka. now too.

whilo · 2022-12-26T22:21:05Z

@jsmassa @kordano This PR is basically complete and ready for feedback. Tests are fixed and it works with datahike-server.

whilo · 2023-01-05T10:59:59Z

This PR also covers #439 btw. You can now pass in tx-metaa noCommit true flag to avoid writing to disk in a transaction. The benefit of this is that it is tracked as data per transaction instead of transactor state that could interleave with other transactions being dispatched or still pending.

…fixes.

Throws properly in case where database has been deleted and operations fail.

…n invalidated.

kordano

Looks well written in total but it lacks testing. With such a large new use case we should have a decent amount of tests covering the new feature.

kordano · 2023-03-13T09:43:49Z

src/datahike/api.cljc

              ;; You may control this feature using the `:keep-history?` attribute:
              (create-database {:store {:backend :mem :id \"example\"} :keep-history? false})

              ;; Initial data after creation may be added using the `:initial-tx` attribute, which in this example adds a schema:
              (create-database {:store {:backend :mem :id \"example\"} :initial-tx [{:db/ident :name :db/valueType :db.type/string :db.cardinality/one}]})"}

  create-database
-  dc/create-database)
+  (fn [& args]


Why did you use an anonymous function instead of just declaring the whole symbol with defn.

Because this namespace overall is written that way. I can change that.

kordano · 2023-03-13T09:58:22Z

src/datahike/config.cljc

@@ -158,17 +165,17 @@
                 :initial-tx (:datahike-intial-tx env)
                 :keep-history? (bool-from-env :datahike-keep-history *default-keep-history?*)
                 :attribute-refs? (bool-from-env :datahike-attribute-refs *default-attribute-refs?*)
-                 :name (:datahike-name env (or *default-db-name* (z/rand-german-mammal)))


When did we decide to remove this?

Might be good to have something like description instead.

I wrote to you about in January or so. The problem is that I cannot de-duplicate the configuration if it is randomized. If zufall could be seeded with a random seed then it would be possible. I did not find dependencies on the :name feature so I decided to remove it. Note that I added store-identity instead, which fully identifies each database through the store it is based on. It is human readable and informative, but not as fun as zufall names.

kordano · 2023-03-13T10:55:10Z

src/datahike/writing.cljc

+
+(defn stored-db? [obj]
+  ;; TODO use proper schema to match?
+  (let [keys-to-check [:eavt-key :aevt-key :avet-key :schema :rschema


You're right. Probably we should use a proper schema. Also should there be a check for the historical indexes?

Normally runtime "type" checks in Clojure should be avoided, but this is motivated because we load something external and we want to be sure that it is not broken in the storing/reading process. This test is fairly cheap while a full schema check would be more expensive and I actually think it is good enough for this reason. But I can check more properties of the loaded map if you think this is helpful. The historical indexes are not checked because they are optional and not all databases have them.

kordano · 2023-03-13T10:59:57Z

src/datahike/writing.cljc

+           :store store)))
+
+(defn branch-heads-as-commits [store parents]
+  (set (doall (for [p parents]


Formatting would be nicer with ->.

kordano · 2023-03-13T11:09:28Z

src/datahike/writing.cljc

@@ -0,0 +1,239 @@
+(ns datahike.writing


Where do I find tests for this namespace?

This namespace is just factored out from the prior connection namespace. It is tested with all the existing tests, I just tried to factorize the code base a bit better as connection was conflating too many things.

whilo · 2023-03-13T18:46:52Z

Looks well written in total but it lacks testing. With such a large new use case we should have a decent amount of tests covering the new feature.

@kordano I agree, but note that the code does not change the way the connection works for existing use cases and hence is tested for them with the current code base. Accessing a remote connection should be tested with an integration test including datahike-server and datahike-server-transactor. Since we also want to show case that, it would be good for me to get some feedback how to do it. We can then add an integration test in the same way we do it for datahike with konserve. Is that ok? I still think that this can be merged now then and the additional test can be added with the new use case.

kordano · 2023-03-13T18:53:58Z

Yeah, it's fine to add them later on. So go ahead an release it. My other comments where primarily superficial and about formatting.

kordano self-requested a review May 21, 2021 07:55

kordano added the enhancement New feature or request label May 21, 2021

whilo marked this pull request as draft May 21, 2021 19:12

whilo requested review from grischoun and TimoKramer May 21, 2021 19:25

TimoKramer reviewed May 23, 2021

View reviewed changes

alekcz mentioned this pull request May 26, 2021

switch to http-kit replikativ/datahike-client#1

Merged

whilo marked this pull request as ready for review May 31, 2021 04:55

whilo marked this pull request as draft May 31, 2021 08:00

whilo changed the base branch from development to main December 23, 2021 11:54

whilo mentioned this pull request Oct 20, 2022

GraalVM native-image build support #337

Merged

whilo force-pushed the 322-support-multiple-readers branch from e981e72 to 36fbd2b Compare December 26, 2022 21:51

whilo marked this pull request as ready for review December 26, 2022 22:17

whilo changed the base branch from main to 60-add-documentation December 26, 2022 22:30

whilo changed the base branch from 60-add-documentation to main December 26, 2022 22:30

whilo mentioned this pull request Dec 27, 2022

Separate unit tests for index types #580

Merged

whilo force-pushed the 322-support-multiple-readers branch 4 times, most recently from adce05f to 36fbd2b Compare January 5, 2023 10:52

whilo force-pushed the 322-support-multiple-readers branch from 96715ae to 54e8a9b Compare January 6, 2023 21:38

whilo and others added 18 commits March 4, 2023 19:31

Use right comparator.

9c117ea

Do not report full db.

384f4d9

Factor out connection id creation for tests. Unify exceptions. Minor …

22dc4dc

…fixes.

refine transact argument check

9f036ff

Small fixes

05fe496

Remove deprecation message.

d079517

Fix documentation.

d1aa76d

Propagate all exceptions.

997ed29

Throws properly in case where database has been deleted and operations fail.

Throw a more friendly error indicating that connection might have bee…

849800f

…n invalidated.

Polish writer namespace a bit further.

d181e46

Add documentation.

ab0b17c

Bump konserve version.

2c34f56

Fix version check.

7606fe3

Also handle stored DEVELOPMENT versions permissively.

2567660

Update copyright.

bc72703

Fix namespace imports.

ad91bbd

Decouple query-stats test store.

6937059

Fix format.

51474b8

whilo force-pushed the 322-support-multiple-readers branch from 36fbd2b to 51474b8 Compare March 5, 2023 03:41

Merge branch 'main' into 322-support-multiple-readers

4e3d570

kordano reviewed Mar 13, 2023

View reviewed changes

jsmassa approved these changes Mar 15, 2023

View reviewed changes

jsmassa changed the title ~~Distributed reads and transactor for datahike-server~~ feat: distributed reads and transactor for datahike-server Mar 15, 2023

whilo merged commit c0cc8f3 into main Mar 15, 2023

This was referenced Mar 16, 2023

[Bug]: Connection does not reflect updates to database state made by another connection #464

Closed

Connect to existing filestore created in different process fails #387

Open

Cost of d/connect and connection pooling #169

Closed

whilo deleted the 322-support-multiple-readers branch February 7, 2024 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: distributed reads and transactor for datahike-server #332

feat: distributed reads and transactor for datahike-server #332

whilo commented May 21, 2021 •

edited

Loading

TimoKramer commented May 23, 2021

TimoKramer left a comment

TimoKramer May 23, 2021

alekcz commented May 26, 2021

whilo commented May 27, 2021

TimoKramer commented May 27, 2021

whilo commented May 27, 2021

alekcz commented May 27, 2021

whilo commented Dec 26, 2022 •

edited

Loading

whilo commented Jan 5, 2023

kordano left a comment

kordano Mar 13, 2023

whilo Mar 13, 2023

kordano Mar 13, 2023

kordano Mar 13, 2023

whilo Mar 13, 2023 •

edited

Loading

kordano Mar 13, 2023

whilo Mar 13, 2023

kordano Mar 13, 2023

kordano Mar 13, 2023

whilo Mar 13, 2023

whilo commented Mar 13, 2023

kordano commented Mar 13, 2023

feat: distributed reads and transactor for datahike-server #332

feat: distributed reads and transactor for datahike-server #332

Conversation

whilo commented May 21, 2021 • edited Loading

TimoKramer commented May 23, 2021

TimoKramer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alekcz commented May 26, 2021

whilo commented May 27, 2021

TimoKramer commented May 27, 2021

whilo commented May 27, 2021

alekcz commented May 27, 2021

whilo commented Dec 26, 2022 • edited Loading

whilo commented Jan 5, 2023

kordano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whilo Mar 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whilo commented Mar 13, 2023

kordano commented Mar 13, 2023

whilo commented May 21, 2021 •

edited

Loading

whilo commented Dec 26, 2022 •

edited

Loading

whilo Mar 13, 2023 •

edited

Loading