Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: distributed reads and transactor for datahike-server #332

Merged
merged 42 commits into from
Mar 15, 2023

Conversation

whilo
Copy link
Member

@whilo whilo commented May 21, 2021

This PR implements distributed read operations by redirecting the access to the connection to the underlying konserve storage to update the DB value (closes #322). It requires one roundtrip to fetch the root under :db and is a polling mechanism that is efficient if the reader is on average not accessing the database more often than it is being written to it and if the latency of the store access is not too high for the application. For low latency the transactor interface now exposes whether it is actively streaming, as is the default local transactor that we had so far (basically directly operating on the connection atom).

This PR also provides an implementation of the transactor interface for datahike-server, which provides the single writer to the underlying store and is not able to stream yet. Provided datahike server is running with the same datahike version and configured with the example configuration, the following configuration works now for an arbitrary number of Datahike peers (readers):

{:store    {:backend :file :path "/tmp/dh-file"}
            :keep-history? true ;; true
            :schema-flexibility :read
            :writer {:backend :datahike-server
                     :client-config  {:timeout 300
                                      :endpoint "http://localhost:3333"
                                    ;:token "securerandompassword"
                                      :db-name "users"}}
            }

It is not clear to me yet whether we want to keep this transactor in this repository or move it into a separate one because we now have an additional dependency on datahike-client that we can avoid. Probably it is better to pull it out. The benefit of including it in Datahike is that Datahike will then always work with datahike-server instances out of the box.

@kordano kordano self-requested a review May 21, 2021 07:55
@kordano kordano added the enhancement New feature or request label May 21, 2021
@whilo whilo marked this pull request as draft May 21, 2021 19:12
@whilo whilo requested review from grischoun and TimoKramer May 21, 2021 19:25
@TimoKramer
Copy link
Member

datahike-client will not work either with datahike when compiled with JDK<11 because I am using the JDK built-in http-client in the datahike-client

Copy link
Member

@TimoKramer TimoKramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Do you intend to add the streaming within this PR?

deps.edn Outdated
@@ -8,7 +8,9 @@
io.replikativ/superv.async {:mvn/version "0.2.11"}
io.lambdaforge/datalog-parser {:mvn/version "0.1.8"}
io.replikativ/zufall {:mvn/version "0.1.0"}
junit/junit {:mvn/version "4.13.1"}}
junit/junit {:mvn/version "4.13.1"}
io.replikativ/datahike-client {:git/url "https://github.com/replikativ/datahike-client.git"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made a 0.1.0-SNAPSHOT

@alekcz
Copy link
Contributor

alekcz commented May 26, 2021

datahike-client will not work either with datahike when compiled with JDK<11 because I am using the JDK built-in http-client in the datahike-client

@TimoKramer I've submitted a pull req to make the client work on JDK8

@whilo
Copy link
Member Author

whilo commented May 27, 2021

@alekcz Does http-kit also work with Graal's native-images? We are currently heading in this direction and I think it will be particularly beneficial if our HTTP client is also embeddable into native images.

@TimoKramer
Copy link
Member

babashka supports http-kit so this seems to work. I think it is a good choice because with borkdude supporting it it is obviously very well alive:)

@whilo
Copy link
Member Author

whilo commented May 27, 2021

@TimoKramer I think Datahike should be usable without depending on libraries for network IO. Therefore I think it would not be bad to move the code of the actual DatahikeServerTransactor and with it the dependencies on a HTTP client out of this PR into a separate downstream repository that can be included into projects as needed. I would do the same for the native-image cli interface. Nonetheless it could potentially include the DatahikeServerTransactor.

@alekcz
Copy link
Contributor

alekcz commented May 27, 2021

@alekcz Does http-kit also work with Graal's native-images? We are currently heading in this direction and I think it will be particularly beneficial if our HTTP client is also embeddable into native images.

Yes it does. It's actually a built-in library in babashka. now too.

@whilo whilo marked this pull request as ready for review May 31, 2021 04:55
@whilo whilo marked this pull request as draft May 31, 2021 08:00
@whilo whilo changed the base branch from development to main December 23, 2021 11:54
@whilo whilo force-pushed the 322-support-multiple-readers branch from e981e72 to 36fbd2b Compare December 26, 2022 21:51
@whilo whilo marked this pull request as ready for review December 26, 2022 22:17
@whilo
Copy link
Member Author

whilo commented Dec 26, 2022

@jsmassa @kordano This PR is basically complete and ready for feedback. Tests are fixed and it works with datahike-server.

@whilo whilo changed the base branch from main to 60-add-documentation December 26, 2022 22:30
@whilo whilo changed the base branch from 60-add-documentation to main December 26, 2022 22:30
@whilo whilo force-pushed the 322-support-multiple-readers branch 4 times, most recently from adce05f to 36fbd2b Compare January 5, 2023 10:52
@whilo
Copy link
Member Author

whilo commented Jan 5, 2023

This PR also covers #439 btw. You can now pass in tx-metaa noCommit true flag to avoid writing to disk in a transaction. The benefit of this is that it is tracked as data per transaction instead of transactor state that could interleave with other transactions being dispatched or still pending.

@whilo whilo force-pushed the 322-support-multiple-readers branch from 96715ae to 54e8a9b Compare January 6, 2023 21:38
@whilo whilo force-pushed the 322-support-multiple-readers branch from 36fbd2b to 51474b8 Compare March 5, 2023 03:41
Copy link
Member

@kordano kordano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks well written in total but it lacks testing. With such a large new use case we should have a decent amount of tests covering the new feature.

;; You may control this feature using the `:keep-history?` attribute:
(create-database {:store {:backend :mem :id \"example\"} :keep-history? false})

;; Initial data after creation may be added using the `:initial-tx` attribute, which in this example adds a schema:
(create-database {:store {:backend :mem :id \"example\"} :initial-tx [{:db/ident :name :db/valueType :db.type/string :db.cardinality/one}]})"}

create-database
dc/create-database)
(fn [& args]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you use an anonymous function instead of just declaring the whole symbol with defn.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this namespace overall is written that way. I can change that.

@@ -158,17 +165,17 @@
:initial-tx (:datahike-intial-tx env)
:keep-history? (bool-from-env :datahike-keep-history *default-keep-history?*)
:attribute-refs? (bool-from-env :datahike-attribute-refs *default-attribute-refs?*)
:name (:datahike-name env (or *default-db-name* (z/rand-german-mammal)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When did we decide to remove this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to have something like description instead.

Copy link
Member Author

@whilo whilo Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote to you about in January or so. The problem is that I cannot de-duplicate the configuration if it is randomized. If zufall could be seeded with a random seed then it would be possible. I did not find dependencies on the :name feature so I decided to remove it. Note that I added store-identity instead, which fully identifies each database through the store it is based on. It is human readable and informative, but not as fun as zufall names.


(defn stored-db? [obj]
;; TODO use proper schema to match?
(let [keys-to-check [:eavt-key :aevt-key :avet-key :schema :rschema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Probably we should use a proper schema. Also should there be a check for the historical indexes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally runtime "type" checks in Clojure should be avoided, but this is motivated because we load something external and we want to be sure that it is not broken in the storing/reading process. This test is fairly cheap while a full schema check would be more expensive and I actually think it is good enough for this reason. But I can check more properties of the loaded map if you think this is helpful. The historical indexes are not checked because they are optional and not all databases have them.

:store store)))

(defn branch-heads-as-commits [store parents]
(set (doall (for [p parents]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting would be nicer with ->.

@@ -0,0 +1,239 @@
(ns datahike.writing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do I find tests for this namespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This namespace is just factored out from the prior connection namespace. It is tested with all the existing tests, I just tried to factorize the code base a bit better as connection was conflating too many things.

@whilo
Copy link
Member Author

whilo commented Mar 13, 2023

Looks well written in total but it lacks testing. With such a large new use case we should have a decent amount of tests covering the new feature.

@kordano I agree, but note that the code does not change the way the connection works for existing use cases and hence is tested for them with the current code base. Accessing a remote connection should be tested with an integration test including datahike-server and datahike-server-transactor. Since we also want to show case that, it would be good for me to get some feedback how to do it. We can then add an integration test in the same way we do it for datahike with konserve. Is that ok? I still think that this can be merged now then and the additional test can be added with the new use case.

@kordano
Copy link
Member

kordano commented Mar 13, 2023

Yeah, it's fine to add them later on. So go ahead an release it. My other comments where primarily superficial and about formatting.

@jsmassa jsmassa changed the title Distributed reads and transactor for datahike-server feat: distributed reads and transactor for datahike-server Mar 15, 2023
@whilo whilo merged commit c0cc8f3 into main Mar 15, 2023
@whilo whilo deleted the 322-support-multiple-readers branch February 7, 2024 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 writer, n reader
5 participants