Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for key-value store #23

Merged
merged 2 commits into from
Aug 17, 2023
Merged

Conversation

piotr-karas
Copy link
Contributor

Hi!

Here is huge pull request with full support for key-value store based on JetStreams.
There is a lot of code, but core nats and jet_streams are almost untouched - beside some changes to _dict2json which now enables objects to be formatted - it was necessary in order to enable mirror and sources options in add_stream method (which were necessary for kv to enable mirroring).
Most of this code was based on behaviour and API of nats-cli client with some small changes. I also looked at ADR which are linked in source code. There is also documentation written with examples as well as a lot of tests.
Hope you find it interesting :)

Kindest regards

@Kazmirchuk
Copy link
Owner

Hi Piotr,
wow this is indeed tremendous effort from your side, thank you so much for the contribution! And it's nice that you tried to minimize changes to existing files :)

Ironically, right now at my daily job I'm also working on NATS KV support, but there I have another kind of Tcl client that is a wrapper over the C client with C++/Qt - so, very different implementation, but the same questions about how Tcl API should look like.

This time I'd like to avoid my mistake that I did back in 2021, when I accepted the JetStream contribution from ANT, left it as-is for many months, and then came back and rearranged many things for the 2.0 release, with many API changes, while you already had plenty of code running in production that would break after update. Now I have time for a proper review and work together with you for a good implementation, and hopefully you don't have much production code yet using KV :)

Generally it's better to do this kind of major development as a Github issue, so that I can help and guide you - this way you can save time and effort.

While I'm reviewing it, I'll post my thoughts in further comments.

@Kazmirchuk
Copy link
Owner

by the way, did you notice the functions to create streams and consumers from JSON files? I thought, it would be convenient for your team, since you have clients in different languages, and you can generate these JSON configs with NATS CLI and then keep them in git, instead of passing a long list of options to the Tcl client? So, if you think there's potential in this, it would be nice to have the same for creating buckets. I can add it later myself.

@Kazmirchuk
Copy link
Owner

Kazmirchuk commented Aug 11, 2023

I really appreciate that you wrote tests, docs and examples as well 👍

IMHO using NATS CLI is good for the first introduction to KV, but still - it is an interactive tool, not a client library. When I design the Tcl API, I look first of all at nats.py, and when in doubt - at nats.c and nats.go (I don't really know Go, but it's readable). So, here are the things I'd like you to change in the PR, with numbering for later reference:

  1. the $js key_value method does not take a bucket argument, and so the nats::key_value object can work on any bucket. This is different from other NATS clients. Look e.g. at nats.py:
js = nc.jetstream()
# bind to KV
kv = await js.key_value(bucket='MY_KV')
# Set and retrieve a value
await kv.put('hello', b'world')

And inside async def key_value it retrieves stream_info for this bucket, and raises an error if it doesn't exist. So, if the Tcl API follows the same approach, you can remove all cases of -check_bucket. In other words, if $js key_value $bucket succeeded, it means the bucket exists. nats.c does the same.

  1. Following the above, the nats::key_value object can't be used to create/delete buckets. nats::jet_stream should be responsible for it, e.g.:
$js create_key_value ...
$js delete_key_value $bucket

Same function names are used in nats.go

@Kazmirchuk
Copy link
Owner

  1. While the read-only access to KV is indeed mentioned in the ADR-8, I can't see it being implemented in any of the official clients - I've looked at nats.go, nats.c, nats.py and Java NATS. I've checked their Slack channel too. I suggest to remove it, unless you see some practical value in it?

@Kazmirchuk
Copy link
Owner

  1. $kv get returns an entry (which is a dict), and this is in line with nats.py, but I suppose that in production code in most cases people will care only about the actual value, not about revision numbers or creation time, leading to repetitive code like
set value [dict get [$kv get $bucket $key] value]

I suggest adding a convenience method, let's call it get_value, that returns just the value:

set value [$kv get_value $key]

@Kazmirchuk
Copy link
Owner

That's all I have to say for now. I know I'm asking for a lot :) all this JetStream stuff turned out to be much more time-consuming for me than I thought in the beginning. It's objectively hard, and information is scattered.

Please let me know if you have any deadlines for this PR to be merged.

@Kazmirchuk
Copy link
Owner

next week I'll try to sketch the Tcl KV API, because TBH I'm not 100% happy with naming by Synadia. E.g. in Go they have KeyValueManager.CreateKeyValue that creates a KV store, KeyValueManager.KeyValue to bind to an existing KV store, and then KeyValue.Create : add the key/value pair iff it does not exist. Which is not a big deal in a compiled language with IDE autocompletion, but in a scripting language like Tcl it's better to have different names. And they use "bucket" and "store" interchangeably =\ I think, method names like create_kv_store and bind_kv_store would be more self-descriptive.

@piotr-karas
Copy link
Contributor Author

Thanks for reviewing changes!

These are very good suggestions - I will try to implement them this or next week. We have no production code yet, so there is no problem in changing things now - as you said it is a lot better to do it now, than later change the API.
We also rather don't have specific deadline, but we will need KV functionality soon, so the sooner it is available, the sooner we can start working with it and developing on top of it.

@Kazmirchuk
Copy link
Owner

I've drafted the KV API here , looking at the ADR, nats.py and nats.go. I think, for you it will be more or less trivial rearrangement. By the way, you can keep the new methods for bucket management in key_value.tcl by extending jet_stream class using oo::define.

Removing -check_bucket should make things easier too.

And regarding the read_only flag: can be removed for now, and if in future we need it, it's better to implement using a separate class, created from kv_bucket with respective methods removed using TclOO deletemethod.

The entry and bucket_status dicts can stay as-is.

The biggest question for me is the watcher. For my daily job I've just implemented the Tcl watcher based on nats.c, and I really didn't like that the C API doesn't provide a callback, like I expected - after all, this is just a push subscriber. nats.go uses channels, and I don't know Go, but probably it is non-blocking API, and same for nats.py - async def updates is a coroutine, so I can use it as I like. But in C kvWatcher_Next really blocks the thread. Luckily, I could workaround it reasonably easy in Tcl using polling and 1ms timeout, but if I had to implement it in C++, I'd be in trouble with thread synchronization etc.

For this pure Tcl client it might be enough to have method updates, even without the callback option, because you can run it in a loop in a separate coroutine, so it's not really blocking.

So, while doing it I realized that the watcher is basically a local cache of the whole bucket or its portion, depending on a wildcard that you're watching. So, I've added extra methods to get values directly from the watcher - without a roundtrip to NATS, like kv_bucket does. No official clients have it. It's just a small convenience, because a user can make such array themselves, while receiving updates from the watcher.

So, it would be useful to understand your use case for watchers. Will you use it only to receive new values with -updates_only, or you'll need to dump old values too with -include_history? Will it be better for you to call updates in a loop (like in other languages), or have a callback? The fact that we had many callbacks in other parts of this library doesn't mean, that we MUST do a callback here as well.

@Kazmirchuk
Copy link
Owner

Kazmirchuk commented Aug 16, 2023

For example, JS consume would be a bit simpler without -callback, but as far as I understand, you're already using it in production, and of course I don't want to break it.

My general approach is lazy: if something doesn't HAVE to be coded - don't code it :)

@piotr-karas
Copy link
Contributor Author

I have rearranged most of the API according to proposed changes. You can check it now if it fits other clients or there are any other changes to be made.

Also some deadline occurred - it would be nice if this PR was merged until next Thursday (24.08) (I have two week vacation so it would be nice to close it till then).

Referring to creating streams and KVs from JSON (add_stream_from_json) - I think we are not using it . To be honest we don't even have 2.0 version yet (so it's not even possible I guess, AFAIK it wasn't available before 2.0) - we are working to migrate to it, but we need to make changes to our codebase and test everything. Creating KVs from JSON is also not needed for us (or at least according to my knowledge) - there are only few options available for it (and I guess only -history will be used) and making separate file would be somewhat strange. But if you think it could be useful it can be implemented later.

As it comes to watch - I haven't changed anything yet, because it seems like a lot of changes and I will try to do it in separate commit (in this thread). AFAIK watch will be used by us to quickly retrieve current values (so include_history is not needed I guess). When I was implementing current version it seemed natural to use callbacks - rest of library is working that way and most of the code (in TCL) I have seen also operates on callbacks. In my opinion it also seems easier to use than coroutines.

But you are right that if user wants to use it as local cache than it would be nice to have this helper methods that you suggested. I guess that it would look something like that:

  • kv watch creates watcher object
  • consumer is created instantly and received messages are appended to updates_queue queue or to waiting kv_watcher updates if it is waiting
  • when user calls kv_watcher updates than if updates_queue is not empty the oldest message is returned and removed from queue (FIFO), otherwise if updates_queue is empty kv_watcher updates hangs and waits for new messages. This queue is necessary IMO if e.g. user calls updates in a loop, but he enters event loop after receiving message - in that situation some messages could be lost meantime
  • There should also be separate method e.g. kv_watcher error that would work similarly to updates - so when there are some errors with the watchers user can be notified (use in another coroutine).
  • As it comes to saving/caching all received messages to be able to quickly respond to get method - I think that it should be configurable: sometimes it would be enough to save just the latest values from each key, but sometimes maybe all history would matter.
  • I would like to leave -callback method to let user decide how to use watcher (but don't know how to do it yet - watcher object will probably be created as normal (so deleting watch would work the same), but saving messages to updates_queue or local cache would probably be unnecessary).

So there are my thoughts for now - I will probably start implementing it tomorrow unless you think it is not necessary for now. If you have any suggestions let me know.

@Kazmirchuk
Copy link
Owner

Thanks! I looked very briefly, it's much better now 👍
I'll be away for vacation until 28 August, so I'll merge this now, and when I'm back, will continue the work

@Kazmirchuk
Copy link
Owner

and don't worry about the watcher, your input is enough for me to do it :)

@Kazmirchuk Kazmirchuk merged commit de63f0b into Kazmirchuk:master Aug 17, 2023
@Kazmirchuk
Copy link
Owner

P.S. you are really fast! 😮

@piotr-karas
Copy link
Contributor Author

Thanks! 😄

@Kazmirchuk
Copy link
Owner

The -mirror and -sources options of add_stream are quite complex (nested dicts), and also limited compared to nats-py. If you need them only for the KV buckets, I'd prefer to hide these options from the public API and docs. In the same time, the options for create_kv_bucket are simple and I'll leave them as is.

@Kazmirchuk
Copy link
Owner

@piotr-karas also could you tell your opinion on #24 ? It matters if you replicate streams and need good performance.

@Kazmirchuk
Copy link
Owner

Kazmirchuk commented Sep 7, 2023

oh no I've noticed this only now by accident

https://nats.io/blog/preview-release-new-jetstream-client-api/

@piotr-karas I think, this is something to discuss in your team

btw the post was written by your old colleague Tomasz :)

@piotr-karas
Copy link
Contributor Author

piotr-karas commented Sep 11, 2023

Regarding -mirror and -sources options in add_stream: I have thought, that event they are complex, I would like them to be included in the API. We are not using them anywhere else apart from KV for now, but I don't know if it would always be the case. When someone would need this feature, it would be better (IMO) to have anything that is ready to use (even when it is hard), than nothing, which requires changes to the nats-tcl library.
But if you think otherwise I can change it and hide it from user.

Regarding Preview Release of the New JetStream Client API:
That is interesting, but it is hard to say anything for now - I guess we won't make any changes soon.
I have seen Tomasz in many other places too and it is nice to come across his work :)

Oh, and sorry for such a long response time - as I mentioned I was on holiday

@Kazmirchuk
Copy link
Owner

thanks, I'll open a separate issue for -mirror and -sources, because I'd like to do them somewhat differently.

I can imagine it's not easy for you to update dependencies like nats-tcl in a restricted factory environment, so need to plan ahead.

I'm still working on KV, making good progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants