Replies: 2 comments
-
This has been open for a week, and we have 4 +1s. I am going to proceed with a PR introducing the design. Thanks! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I would like to suggest a new feature for Nessie: an event notification system.
This new feature would allow external actors to register themselves as subscribers for events published by Nessie. Such events could be: a commit, a merge, a transplant, a change to a named reference, a change to a table or view, etc. (A strict definition of what an event is should be part of this effort.)
Possible use cases include, but are not limited to, automatic data cleanup or optimization. E.g. when a table is deleted, then some cleanup job could be executed automatically.
If others agree with the below high-level proposal, I can provide a full design document, either as a PR for others to review, or as comments under this discussion thread.
Thanks!
Goals & Non-goals
I think the following should be considered primary goals:
However, we should strive to keep this system as simple as it can be:
The required changes would likely take the shape of a decorator pattern around
VersionStore
calls, pretty much likeTracingVersionStore
orMetricsVersionStore
. Obviously, they should be designed in such a way that doesn't significantly impact Nessie's performance and response times – which probably involves asynchronous processing.Moreover, message delivery guarantees are a tough topic in distributed systems, and Nessie's new notification system won't be any exception to that. With that in mind, I think the initial implementation shouldn't be too ambitious and aim for the following, admittedly low guarantees:
Relationship with Iceberg
Apache Iceberg is not the intended client/consumer for Nessie's event notification system. The typical consumer for this system would sit one layer above Iceberg. It may react to a Nessie event by triggering an Iceberg operation, but Iceberg itself would not be changed in such a way that it would trigger operations automatically on Nessie's behalf.
In apache/iceberg#7194, a tighter coupling between Iceberg and events from an external catalog is being discussed. Nessie's notification system will NOT require analogous changes.
Previous work
There is a previous design doc proposing a similar feature: #3387. This PR is now considered obsolete, but some of its ideas could definitely be reused. However it proposes an access pattern that is API-based and involves the registration of a webhook; I believe this pattern should be avoided for security reasons, the SPI-based pattern that I am suggesting being a more secure approach.
There is also an
AdapterEvent
API already in Nessie. This API is plugged around theDatabaseAdapter
interface, used in the old storage model. Since it is tied to the old storage model, it is hard to expand on it, but again, many ideas outlined there could be inspirational.Alternatives
It should be noted that alternatives exist for users willing to be notified of catalog changes; but these currently do not seem to provide better solutions for the identified use cases:
Change Data Capture for Iceberg. This is not fully implemented yet, and there is little documentation around it (for now).
Debezium: it should be feasible to "just" plug Debezium on Nessie's database and get a stream of events out of it, but this would only work if the backend in use is compatible, and besides, the resulting events would be too low-level, especially with the new storage model.
Beta Was this translation helpful? Give feedback.
All reactions