Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage+sql: Feature request: Row-level TTLs #20239

Closed
glerchundi opened this issue Nov 23, 2017 · 28 comments
Closed

storage+sql: Feature request: Row-level TTLs #20239

glerchundi opened this issue Nov 23, 2017 · 28 comments
Labels
A-kv-client Relating to the KV client and the KV interface. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community T-kv KV Team

Comments

@glerchundi
Copy link

glerchundi commented Nov 23, 2017

It would be interesting to have a possible row-level TTL so it can be used as an auto-purgable event store.

WDYT?

Jira issue: CRDB-5938

@glerchundi glerchundi changed the title Row-level TTLs Feature request: Row-level TTLs Nov 23, 2017
@dianasaur323 dianasaur323 added the O-community Originated from the community label Nov 27, 2017
@dianasaur323
Copy link
Contributor

Hmm... I could have sworn that we had an open issue for this, but I can't seem to find it. Others have requested it, but it hasn't formally made it onto a roadmap. Thanks for submitting! We'll leave this issue open to track the request.

@grempe
Copy link

grempe commented Feb 23, 2018

We'd (Tierion) make use of this feature. We write lots of ephemeral data into CRDB which we then later have to delete which has been the source of performance issues.

@glerchundi
Copy link
Author

glerchundi commented Apr 5, 2018

@grempe thanks for sharing.

@dianasaur323 just to let you know, we're developing a new product and at the same time evaluating the possibility to use cockroach as our primary datastore for nearly everything:

  • Distributed locking which requires:
    • Creating new locks: INSERT INTO dlocks VALUES (token, expire);
    • Renewing new locks: UPDATE dlocks SET expire=new_expire WHERE token=x;
    • Remove expired locks: DELETE FROM dlocks WHERE expire<now();
  • Distributed dynamic discovery: Internal DNS which requires expirable DNS records based on those TTLs.
  • Automatic purgation of events generated by our system & devices (IoT devices). We're going to limit our system with two different restrictions: by size and by time (with a TTL for every event).

This in conjunction with JSON, row-level partitioning & the upcoming CDC support (#2656) would make CockroachDB a really awesome solution for us!

@dianasaur323
Copy link
Contributor

@grempe so sorry for not responding earlier - this must have just gotten lost in my github firehose. Thank you for using us! A nearer term option for TTL is to actually use our partitioning feature to support a fast drop of a partition. Does that sound like something that would work for you?

@glerchundi hello! I believe my co-worker has been talking to you about a couple other things as well. It's great to hear that you find a bunch of features useful to you. The same question applies to you - would a bulk delete of a partition work for you in terms of TTL? Also, we have a really simple prototype for CDC that will be merged into the master branch soon. Would you like to beta test that to see if it's inline with what you will need?

@glerchundi
Copy link
Author

@glerchundi hello! I believe my co-worker has been talking to you about a couple other things as well. It's great to hear that you find a bunch of features useful to you.

Yep!

The same question applies to you - would a bulk delete of a partition work for you in terms of TTL?

In case I didn't understand well let me rephrase with my own words. The idea would be to row-level partition based on a column and a condition like: create partition expired-keys on dlocks table where expiration column value is less than now()? And then periodically bulk delete every row in expired-keys partition?

Also, we have a really simple prototype for CDC that will be merged into the master branch soon. Would you like to beta test that to see if it's inline with what you will need?

Sounds interesting but we'll be very busy this month. In case soon means more than one month definitely we'll be interested. In case it's less we'll keep an eye on the master's progress :)

Thanks for the update @dianasaur323, btw cool nickname :)

@dianasaur323
Copy link
Contributor

In case I didn't understand well let me rephrase with my own words. The idea would be to row-level partition based on a column and a condition like: create partition expired-keys on dlocks table where expiration column value is less than now()? And then periodically bulk delete every row in expired-keys partition?

So the actual implementation of this is still up in the air, but yes, kind of. The idea would be that you could set an expiration column value for every month or something, so CockroachDB would co-locate entries that occurred in a given month, and then you would be able to bulk delete that partition instead of iterating through every row in order to delete the data. That being said, this might cause a certain range to be really hot, so more thinking is needed here. I guess the question here is how hands off do you need TTL to be, and are you okay with an asynchronous delete job? We haven't begun development on this, although are exploring how we can leverage existing features to meet these use cases.

Sounds interesting but we'll be very busy this month. In case soon means more than one month definitely we'll be interested. In case it's less we'll keep an eye on the master's progress :)

No worries - we are likely going to have something early to share within the next two weeks. Since you don't have time in this month, we can always ping you when we have a second iteration to see if you have time then!

I'm a huge fan of dinosaurs :)

@knz knz added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 27, 2018
@knz knz changed the title Feature request: Row-level TTLs storage+sql: Feature request: Row-level TTLs Apr 27, 2018
@tbg tbg added the A-kv-client Relating to the KV client and the KV interface. label May 15, 2018
@tbg tbg added this to the Later milestone Jul 22, 2018
@petermattis petermattis removed this from the Later milestone Oct 5, 2018
@tim-o
Copy link
Contributor

tim-o commented Apr 30, 2019

Zendesk ticket #3232 has been linked to this issue.

@orangecoding
Copy link

hey guys, what is the status on this?

@RoachietheSupportRoach
Copy link
Collaborator

Zendesk ticket #3577 has been linked to this issue.

@Fornax96
Copy link

In pixeldrain.com I'm using the exact locking system which @glerchundi described for counting unique views on uploaded files. It works pretty well, but the removal query takes a few seconds to complete. I'm afraid this won't scale well if there are millions of rows to be deleted. A row TTL feature would be very useful for me.

@bladefist
Copy link

Deletes don't scale well on massive tables, so if Cockroach implements this they are going to have to do something other than delete in my opinion. Cleaning up data on huge tables at the moment is very difficult and basically requires you to stop all traffic to the db first.

@sbward
Copy link

sbward commented Jun 1, 2020

One way to handle this might be to support automatic deletion when the primary key is a timestamp, or is a multi-column primary key containing a timestamp. In that situation it might be possible to free whole ranges when they fall outside of the TTL window. The engine would also need to ignore "dead" rows within ranges that still contain "live" rows, which would create some probably negligible overhead from skipping the group of dead rows.

Disclaimer: I don't know enough about CRDB's K/V system to know if this actually could work or not. It would be awesome to hear some feedback on whether something like this approach would be feasible.

@ajwerner
Copy link
Contributor

ajwerner commented Jul 1, 2020

One challenge I imagine here is foreign keys and secondary indexes. Ideally we’d push the TTL into the KV, perhaps via something like zone configs. Ideally we’d utilize MVCC timestamps to determine whether a value is dead due to TTL but that doesn’t really work because of the need to maintain referential integrity.

I don’t see an obvious way to implement this TTL efficiently for rows which are indexed in secondary indexes or part of fk relations. I imagine that’s why you see this functionality in nosql database like Cassandra and not in relational databases.

@ajwerner
Copy link
Contributor

ajwerner commented Jul 1, 2020

Perhaps one could do it by maintaining a separate index over timestamps for rows but that seems like a pretty big hammer.

@ericharmeling
Copy link
Contributor

@bdarnell
Copy link
Contributor

bdarnell commented Dec 2, 2020

I think STORING indexes (or something like them) could provide an elegant solution here. In brief, the idea is that if a table has a TTL, all of its indexes must STORE the crdb_internal_mvcc_timestamp column (probably not literally with the current STORING index code due to the magic of that pseudo-column, but something equivalent). This means that every write to the table needs to update every index (not just ones that store the changed columns). Then, because we know that the primary and all secondary indexes will have matching timestamps, we can drop all values retrieved from the KV layer that are older than the TTL just by looking at the query timestamp and table descriptor. That frees us to make the GC asynchronous and independent - we can delete the primary KVs separate from the secondary indexes, but the SQL layer code will never see one without the other.

Regarding foreign keys, I agree this seems tricky and I think we'll need to disallow certain combinations. For example, you can't have an FK where the referred table has a shorter TTL than the referring table. But I think this solution would let you have e.g. a set of tables with the same TTL and FK relationships between them.

@parminder-garcha
Copy link

Just bumping this up, is there any road map to include this in future releases ?

@jordanlewis
Copy link
Member

Dear users who are hoping for this feature: while we're working this out, there's a neat workaround, which is documented here.


Alternatively to @bdarnell's comment, we could avoid storing crdb_internal_mvcc_timestamp on every index, but continue to transparently include a WHERE crdb_internal_mvcc_timestamp > now() - TTL on all queries against the table via the optimizer. This causes an index join against the primary key, so you don't have to worry about not having the most up-to-date timestamp. The downside is that it prevents users from using covering indexes.

As far as the cleanup process, check out this neat little SQL snippet:

create table b (a primary key, b) as select g, g from generate_series(1,100000);

  WITH y AS (SELECT a FROM b WHERE a > $1 LIMIT 1000),
       d AS (
            DELETE FROM b
                  WHERE a IN (SELECT * FROM y)
                    AND crdb_internal_mvcc_timestamp / 1000000000 < now()::INT8 - 999
                  LIMIT 1000
              RETURNING 1
         )
SELECT last_value(a) OVER (ORDER BY a RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
  FROM y
 LIMIT 1;

this scans at most 1000 rows, deletes at most 1000 rows if they're older than a sample 999-second TTL, and returns the PK of the last row scanned. The 1000 constants can be adjusted, and this would also work with composite primary keys with some tweaking.

You could imagine having a job that runs this, persists the result, and replaces the $1 placeholder with the result of the query for the next run.


There are at least three important requirements to work out here:

  1. Does setting a TTL on a table require that no data be shown to the user from outside of the TTL duration, or is it just a hint?
  2. Can users set a true row-level TTL like in Cassandra, or is a table-level TTL sufficient? This issue is called "row-level TTL", but my assumption has been that the primary use cases are actually "table-level TTL": setting the TTL to a given duration for all data in a table.
  3. Are TTLs related to row insert or row update? It's much easier and more natural to do the latter for us.

@ajwerner
Copy link
Contributor

  1. Are TTLs related to row insert or row update? It's much easier and more natural to do the latter for us.

I'm not sure this is true -- however, implementing the former is easy enough without anything from us.


Other important questions:

  1. Is it okay if the cleanup has roughly the overhead of writing rows in the first place?
  2. Do you need or want CDC events when rows are removed via TTL?

These two questions point in different directions for implementation. We've heard both. If you cannot stomach cleanup that costs as much as insertion, you almost definitely can't stomach a CDC event on cleanup. How important is the efficiency of the TTL here?

@ajwerner
Copy link
Contributor

Regarding foreign keys, I agree this seems tricky and I think we'll need to disallow certain combinations. For example, you can't have an FK where the referred table has a shorter TTL than the referring table. But I think this solution would let you have e.g. a set of tables with the same TTL and FK relationships between them.

@bdarnell how did you envision dealing with later writes to the referencing table? @RaduBerinde raised the point that the referenced row may expire before the referencing row. Do we need a touch-write on the referenced row? That doesn't feel very intuitive.

@bdarnell
Copy link
Contributor

I think I was getting my TTL semantics mixed up and thinking of our MVCC GC TTL (which keeps all versions newer than the TTL plus one older version) instead of what's being proposed here (which doesn't try to keep the version that was active as of the TTL). (is there another term we could use for one of these instead of saying "TTL" both times?)

I don't see a better solution offhand than touching the referenced row, which seems like a terrible potential bottleneck. I'd rather not get into that situation, so at least for v1 of this feature I'd disallow FKs that refer to tables with a TTL.

@bladefist
Copy link

I think most of your customers (like me) will be using this feature on tables for like log entries. So historical data just drops off. We have 700GB tables where I wish old records would just fall off and not require complicated flows to delete them.

In our case these tables never have relationships, they're more like a heap than anything.

@glerchundi
Copy link
Author

Hi again!👋

Just wanted to share with you that TiDB crew is presenting this feature in KubeCon. Here you'll find the slides just in case there is something valuable on them: https://static.sched.com/hosted_files/kccnceu2021/cb/TTL%20in%20TiKV.pdf

I suppose that removing the dependency with RocksDB has its benefits but at the same time its drawbacks of not getting the features that it supports? I don't know if Pebble supports TTLs based autopurges though.

@petermattis
Copy link
Collaborator

RocksDB's TTL support doesn't provide a foundation for implementing TTLs in CRDB. Doing compaction time reclamation of keys is at odds with other parts of the CRDB architecture, though we're working towards making this possible (e.g. for compaction time garbage collection of old keys). Also, the linked presentation doesn't talk about the difficulties of maintaining relational consistency between SQL tables which have TTLs. I wonder how TiDB is handling that.

@jlinder jlinder added the T-kv KV Team label Jun 16, 2021
@danthegoodman1
Copy link

Would still love to see this!

@glerchundi
Copy link
Author

Friendly ref. about Google Cloud Spanner releasing today TTL based rows deletion: https://cloud.google.com/blog/products/spanner/reduce-costs-and-simplify-compliance-with-managed-ttl-in-spanner

@otan
Copy link
Contributor

otan commented May 26, 2022

this is now in preview support for v22.1!

https://www.cockroachlabs.com/docs/dev/row-level-ttl.html

@otan otan closed this as completed May 26, 2022
@glerchundi
Copy link
Author

Thanks! This is huge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-client Relating to the KV client and the KV interface. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community T-kv KV Team
Projects
None yet
Development

No branches or pull requests