Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add support for key watches with notifications of changes #6130

Closed
archiecobbs opened this issue Apr 18, 2016 · 20 comments
Closed

sql: add support for key watches with notifications of changes #6130

archiecobbs opened this issue Apr 18, 2016 · 20 comments
Assignees
Labels
A-cdc Change Data Capture C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community

Comments

@archiecobbs
Copy link

This is a feature request.

Some key/value stores give clients the ability to register for notification (via returned Future<?>) when a transaction is committed that changes the value associated with a specific key. For example, FoundationDB provide(ed) this capability with Transaction.watch().

This is especially useful with distributed databases, because it obviates the need to design and implement a separate change notification RPC mechanism between nodes.

This is fairly simple to implement at the key/value store level. Perhaps a harder question is how would you expose this through the SQL API? Perhaps via some (ab)use of triggers, where you could specify a specific table row to watch?

In any case, this would be a useful feature if it can be done.

@petermattis
Copy link
Collaborator

@archiecobbs We've pondered about being able to "watch" for changes to a table, but it sounds like you're asking for the ability to watch for changes to a particular row. Do you have a use case where watching for changes to a particular row would be useful? What if there are limits (as in the FoundationDB case) regarding how many watches are allowed to be active concurrently?

@archiecobbs
Copy link
Author

@petermattis Yes I'm envisioning being able to watch a specific row, or even a column in a row. Here's some more background...

The use cases for this feature that I have encountered in the past generally involve a relatively small number of watched keys, and a small number of watches in effect on those keys, at any one time.

For example, imagine a GUI session displaying some information X which is a small subset of the overall database. This GUI session wants notification every time any of the information it is displaying (X) is changed so it can automatically refresh its view, as this makes the view truly real-time and eliminates the need for a "Refresh" button. So while the GUI session is in existence, we create a single watchable key/object/row in the database that represents "a change has occurred in X".

Note the notification granularity here is whatever the application deems appropriate, and often a conservative (i.e., not completely precise) approximation of whether X has changed is most efficient. That is, it's OK if the GUI refreshes more often than necessary, as long as it's not ridiculously often, if that makes the business logic that calculates whether X has changed simpler.

Then a GUI session displaying X can register a key watch on that key/object/row. The business logic would toggle/increment/whatever this key/object/row whenever a change in X occurs (or is likely to have occurred).

So there is a slight subtlety here - while it's nice to be able to watch for changes directly on the data directly, sometimes what you want to watch is not the actual data itself, but a sentinel or representative piece of data that represents a change in the actual data you care about. This extra level of indirection can make things more efficient when your subset X does not directly correspond to a simple set of database rows.

To take a concrete example: suppose your GUI displays users from the Users table, sorted by username. But it can only display one screenful at a time, which is only a tiny fraction of the entire Users table. Having a watch for any changes in the User table might result in a blizzard of irrelevant change notifications. Instead, create a new table UserWatchers with columns usernameStart, usernameEnd, watchColumn. Whenever a User is changed, business logic will increment the watchColumn in any rows in the UserWatchers table whose username range contains the affected user. Now you have a highly efficient, distributed GUI auto-refresh mechanism, using only a single watched row+column per GUI session.

In general then the number of watched keys scales with the number of types of things you may need to watch, while the number of watches in effect at any time scales with the number of users logged in to the GUI (which itself is bounded, on a per-node basis, by the number of GUI sessions that node can handle).

@petermattis petermattis changed the title Add support for key watches with notifications of changes sql: add support for key watches with notifications of changes Jun 29, 2016
@derkan
Copy link

derkan commented Aug 23, 2016

+1
When working with distributed by using CockroachDB, notifications become important for example; to invalidate caches on application.

PostgreSQL does async notifying of clients with NOTIFY command like here. And a sample usage of this property through triggers is here.

Using triggers for notification is gives user to decide when to and how to notify. And postgresql client support will be already ready.

@karlbohlmark
Copy link

+1
Change notifications would make cockroachdb a nice etcd replacement.

@petermattis petermattis added this to the Later milestone Feb 22, 2017
@HeikoOnnebrink
Copy link

+1
after having implemented various REST APIs backed by ETCD or REDIS I am looking at Consul to support multi datacenter scenarios.
If CockroachDB would get an "ETCD WATCH" like feature it would be a hot candidate as multi DC persistency building block

@mainframe
Copy link

+1

@dianasaur323 dianasaur323 added O-community Originated from the community and removed community-questions labels Apr 23, 2017
@bra-fsn
Copy link

bra-fsn commented Oct 18, 2017

Oracle's Continuous Query Notification/Database Change Notification can also be relevant here:
https://docs.oracle.com/cd/E11882_01/java.112/e16548/dbchgnf.htm#JJDBC28820
A more easier to understand example:
http://www.oracle.com/technetwork/articles/dsl/python-091105.html

Also see this for different use cases:
https://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns_cqn.htm#BDCGGACA
which allows for object and query result notification.
Both of these can have good use cases, so it would be nice to implement them.

@umatomba
Copy link

+1

@dianasaur323 dianasaur323 self-assigned this Oct 26, 2017
@stellanhaglund
Copy link

Is there any progress on this?

@dianasaur323
Copy link
Contributor

@stellanhaglund this issue seems to conflate two feature requests: 1) triggers and 2) change data capture. Change data capture is scheduled to begin development in our next release cycle whereas triggers are not. You can track our progress in change data capture in issue #2656. Would this meet your use case?

@stellanhaglund
Copy link

What i’m after is some way to have realtime data with all the other benefits of cockroach.
And i dont know the best practice for this other than that polling doesnt seem like a very good option, then i read about the notify/listen thing with postgres thinking that could solve it.
Do you know if that would be the best way to achieve realtime data, and is this something you will implement?
In that case the better option for med would be to start with postgres and wait for the feature to arrive here?
Or maybe there is some other way?

@stellanhaglund
Copy link

I did some reading and looks like #2656 would be a good way to do it with something like kafka maybe?
Though I guess this will take some time since first you need to build the cdc part then there needs to be some kind of module that can make some use of that, or is the plan to do it similar to postgres so that existing tools could be used?

@dianasaur323
Copy link
Contributor

dianasaur323 commented Dec 8, 2017 via email

@stellanhaglund
Copy link

Thanks!
The schema things seems easy to keep track of, basically just relying on stuff your schema can do. Right?
But for the workload im not quite sure what you are referring to, do you mean that i use queries that are supported by CRDB?

@dianasaur323
Copy link
Contributor

dianasaur323 commented Dec 8, 2017 via email

@stellanhaglund
Copy link

I've done some additional thinking about this, and I'm having a hard time deciding how to proceed.
I've been looking on some ways to use a connector with Postgres and there are some options, though they are not all the same.
So if I where to choose one of these and build my logic around that, it might not match how cockroach does it, or am I wrong?
That makes me a little scared of the efforts of moving from Postgres to cockroach once this feature is out.
If I where to start my new project with cockroach instead, I would have the benefits of the ability to scale directly in my setup and I could then add the cdc part to that when you release it.
The big questions then are when would you think it would be reasonable to expect this feature is it like 6 months or even more? and at that point would I be able to add it to my existing setup easily?

@dianasaur323
Copy link
Contributor

dianasaur323 commented Dec 12, 2017 via email

@jordanlewis jordanlewis added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 27, 2018
@danhhz danhhz added the A-cdc Change Data Capture label Apr 30, 2018
@knz knz assigned danhhz and unassigned dianasaur323 May 9, 2018
@kannanlakshmi
Copy link
Contributor

The v2.1.0-alpha.20180702 binary released today contains our first CDC prototype. You can find the docs for CDC and the latest binary under testing releases with today's date (Jul 2) below. As always, please let us know of any issues or questions here on Github, or Forum or emailing me directly lakshmi@cockroachlabs.com

Docs: https://www.cockroachlabs.com/docs/v2.1/change-data-capture.html
Release binary: https://www.cockroachlabs.com/docs/releases/#testing-releases

@petermattis petermattis removed this from the Later milestone Oct 5, 2018
@danhhz danhhz assigned rolandcrosby and unassigned danhhz Apr 16, 2019
@rolandcrosby
Copy link

I think we can close this out. CDC in CockroachDB 19.1 is ready for use, and the precise behavior motivating this thread (monitoring only a single key) even works, as follows:

Get a table set up and enable the rangefeed option required for CDC:

create table foo (a int primary key, b string);
insert into foo values (1, 'a');
set cluster setting kv.rangefeed.enabled = true;

On a separate connection, run this and consume results as they're updated
Important: If running in the cockroach sql CLI or another system that buffers results by default, you need to select an output format that does not do any buffering (e.g. via \set display_format=csv), since a changefeed will never finish emitting results.

select * from [experimental changefeed for table foo with updated, resolved] where key = '[]' or key is null;

jordanlewis added a commit to jordanlewis/cockroach that referenced this issue Oct 2, 2019
The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: cockroachdb#5807   (sql: Add support for TEMP tables)
 151: cockroachdb#17511  (sql: support stored procedures)
  86: cockroachdb#26097  (sql: make TIMETZ more pg-compatible)
  56: cockroachdb#10735  (sql: support SQL savepoints)
  55: cockroachdb#32552  (multi-dim arrays)
  55: cockroachdb#26508  (sql: restricted DDL / DML inside transactions)
  52: cockroachdb#32565  (sql: support optional TIME precision)
  39: cockroachdb#243    (roadmap: Blob storage)
  33: cockroachdb#26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: cockroachdb#27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: cockroachdb#12123  (sql: Can't drop and replace a table within a transaction)
  24: cockroachdb#26443  (sql: support user-defined schemas between database and table)
  20: cockroachdb#21286  (sql: Add support for geometric types)
  18: cockroachdb#6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: cockroachdb#22329  (Support XA distributed transactions in CockroachDB)
  16: cockroachdb#24062  (sql: 32 bit SERIAL type)
  16: cockroachdb#30352  (roadmap:when CockroachDB  will support cursor?)
  12: cockroachdb#27791  (sql: support RANGE types)
   8: cockroachdb#40195  (pgwire: multiple active result sets (portals) not supported)
   8: cockroachdb#6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: cockroachdb#23468  (sql: support sql arrays of JSONB)
   5: cockroachdb#40854  (sql: set application_name from connection string)
   4: cockroachdb#35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: cockroachdb#32610  (sql: can't insert self reference)
   4: cockroachdb#40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: cockroachdb#35897  (sql: unknown function: pg_terminate_backend())
   4: cockroachdb#4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: cockroachdb#27796  (sql: support user-defined DOMAIN types)
   3: cockroachdb#3781   (sql: Add Data Type Formatting Functions)
   3: cockroachdb#40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: cockroachdb#35882  (sql: support other character sets)
   2: cockroachdb#10028  (sql: Support view queries with star expansions)
   2: cockroachdb#35807  (sql: INTERVAL output doesn't match PG)
   2: cockroachdb#35902  (sql: large object support)
   2: cockroachdb#40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: cockroachdb#18846  (sql: Support CIDR column type)
   1: cockroachdb#9682   (sql: implement computed indexes)
   1: cockroachdb#31632  (sql: FK options (deferrable, etc))
   1: cockroachdb#24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: cockroachdb#36215  (sql: enable setting standard_conforming_strings to off)
   1: cockroachdb#32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: cockroachdb#36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: cockroachdb#26732  (sql: support the binary operator: <int> / <float>)
   1: cockroachdb#23299  (sql: support coercing string literals to arrays)
   1: cockroachdb#36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: cockroachdb#26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: cockroachdb#21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: cockroachdb#36179  (sql: implicity convert date to timestamp)
   1: cockroachdb#36118  (sql: Cannot parse '24:00' as type time)
   1: cockroachdb#31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None
jordanlewis added a commit to jordanlewis/cockroach that referenced this issue Oct 24, 2019
The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: cockroachdb#5807   (sql: Add support for TEMP tables)
 151: cockroachdb#17511  (sql: support stored procedures)
  86: cockroachdb#26097  (sql: make TIMETZ more pg-compatible)
  56: cockroachdb#10735  (sql: support SQL savepoints)
  55: cockroachdb#32552  (multi-dim arrays)
  55: cockroachdb#26508  (sql: restricted DDL / DML inside transactions)
  52: cockroachdb#32565  (sql: support optional TIME precision)
  39: cockroachdb#243    (roadmap: Blob storage)
  33: cockroachdb#26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: cockroachdb#27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: cockroachdb#12123  (sql: Can't drop and replace a table within a transaction)
  24: cockroachdb#26443  (sql: support user-defined schemas between database and table)
  20: cockroachdb#21286  (sql: Add support for geometric types)
  18: cockroachdb#6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: cockroachdb#22329  (Support XA distributed transactions in CockroachDB)
  16: cockroachdb#24062  (sql: 32 bit SERIAL type)
  16: cockroachdb#30352  (roadmap:when CockroachDB  will support cursor?)
  12: cockroachdb#27791  (sql: support RANGE types)
   8: cockroachdb#40195  (pgwire: multiple active result sets (portals) not supported)
   8: cockroachdb#6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: cockroachdb#23468  (sql: support sql arrays of JSONB)
   5: cockroachdb#40854  (sql: set application_name from connection string)
   4: cockroachdb#35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: cockroachdb#32610  (sql: can't insert self reference)
   4: cockroachdb#40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: cockroachdb#35897  (sql: unknown function: pg_terminate_backend())
   4: cockroachdb#4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: cockroachdb#27796  (sql: support user-defined DOMAIN types)
   3: cockroachdb#3781   (sql: Add Data Type Formatting Functions)
   3: cockroachdb#40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: cockroachdb#35882  (sql: support other character sets)
   2: cockroachdb#10028  (sql: Support view queries with star expansions)
   2: cockroachdb#35807  (sql: INTERVAL output doesn't match PG)
   2: cockroachdb#35902  (sql: large object support)
   2: cockroachdb#40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: cockroachdb#18846  (sql: Support CIDR column type)
   1: cockroachdb#9682   (sql: implement computed indexes)
   1: cockroachdb#31632  (sql: FK options (deferrable, etc))
   1: cockroachdb#24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: cockroachdb#36215  (sql: enable setting standard_conforming_strings to off)
   1: cockroachdb#32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: cockroachdb#36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: cockroachdb#26732  (sql: support the binary operator: <int> / <float>)
   1: cockroachdb#23299  (sql: support coercing string literals to arrays)
   1: cockroachdb#36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: cockroachdb#26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: cockroachdb#21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: cockroachdb#36179  (sql: implicity convert date to timestamp)
   1: cockroachdb#36118  (sql: Cannot parse '24:00' as type time)
   1: cockroachdb#31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None
craig bot pushed a commit that referenced this issue Nov 7, 2019
41252: roachtest: add test that aggregates orm blacklist failures r=jordanlewis a=jordanlewis

The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: #5807   (sql: Add support for TEMP tables)
 151: #17511  (sql: support stored procedures)
  86: #26097  (sql: make TIMETZ more pg-compatible)
  56: #10735  (sql: support SQL savepoints)
  55: #32552  (multi-dim arrays)
  55: #26508  (sql: restricted DDL / DML inside transactions)
  52: #32565  (sql: support optional TIME precision)
  39: #243    (roadmap: Blob storage)
  33: #26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: #27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: #12123  (sql: Can't drop and replace a table within a transaction)
  24: #26443  (sql: support user-defined schemas between database and table)
  20: #21286  (sql: Add support for geometric types)
  18: #6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: #22329  (Support XA distributed transactions in CockroachDB)
  16: #24062  (sql: 32 bit SERIAL type)
  16: #30352  (roadmap:when CockroachDB  will support cursor?)
  12: #27791  (sql: support RANGE types)
   8: #40195  (pgwire: multiple active result sets (portals) not supported)
   8: #6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: #23468  (sql: support sql arrays of JSONB)
   5: #40854  (sql: set application_name from connection string)
   4: #35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: #32610  (sql: can't insert self reference)
   4: #40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: #35897  (sql: unknown function: pg_terminate_backend())
   4: #4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: #27796  (sql: support user-defined DOMAIN types)
   3: #3781   (sql: Add Data Type Formatting Functions)
   3: #40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: #35882  (sql: support other character sets)
   2: #10028  (sql: Support view queries with star expansions)
   2: #35807  (sql: INTERVAL output doesn't match PG)
   2: #35902  (sql: large object support)
   2: #40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: #18846  (sql: Support CIDR column type)
   1: #9682   (sql: implement computed indexes)
   1: #31632  (sql: FK options (deferrable, etc))
   1: #24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: #36215  (sql: enable setting standard_conforming_strings to off)
   1: #32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: #36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: #26732  (sql: support the binary operator: <int> / <float>)
   1: #23299  (sql: support coercing string literals to arrays)
   1: #36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: #26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: #21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: #36179  (sql: implicity convert date to timestamp)
   1: #36118  (sql: Cannot parse '24:00' as type time)
   1: #31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
@socketpair
Copy link

Yep, but how to watch since some revision? If watching connection interrupted and I restart watching, I might lose some events. does not it? In Etcd I can start watching since some exact revision (i.e. last revision that I was able to handle beforethe failure). It's a key feature in etcd watchers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community
Projects
None yet
Development

No branches or pull requests