rfc: temporary tables support

This RFC proposes to add support for temporary tables. Release note: None
cockroachdb · Oct 10, 2019 · 47e2301 · 47e2301
1 parent fdff826
commit 47e2301
Showing 1 changed file with 288 additions and 0 deletions.
diff --git a/docs/RFCS/20191009_temp_tables.md b/docs/RFCS/20191009_temp_tables.md
@@ -0,0 +1,288 @@
+- Feature Name: Temporary Tables
+- Status: draft
+- Start Date: 2019-10-02
+- Authors: Arul, knz
+- RFC PR: #30916
+- Cockroach Issue: #5807
+
+# Summary
+
+This RFC proposes to introduce support for session-scoped temporary tables. Such Temporary Tables 
+can only be accessed from the session they were created in and persist across transactions in the same
+session. Temporary tables are also automatically dropped at the end of the session.
+
+Eventually we want to support transaction scoped temporary tables as well, but that is out of scope for this RFC.
+
+
+# Motivation
+
+A. Compatibility with PostgreSQL -- ORMs and client apps expect this to work. 
+
+B. It exposes an explicit way for clients to write intermediate data to disk.
+
+# Guide-level explanation
+
+Temporary tables (TTs) are data tables that only exist within the session they are defined. 
+This means that two different sessions can use the same TT name without conflict, and the data 
+from a TT gets automatically deleted when the session terminates.
+
+A temporary table is defined using `CREATE TEMP TABLE` or `CREATE TEMPORARY TABLE`. 
+The remainder of the `CREATE TABLE` statement supports all the regular table features.
+
+The differences between TTs and non-temporary (persistent) tables (PTs) are:
+- A TT gets dropped automatically at the end of the session, a PT does not.
+- A PT created by one session can be used from a different session, whereas a TT is only usable from
+ the session it was created in.
+- TTs can depend on other TTs using foreign keys, and PTs can depend on other PTs, but it's not
+ possible to refer to a PT from a TT or vice-versa.
+- The name of a newly created PT can specify the `public` schema, 
+and if/when CRDB supports user-defined schemas, can specify any user-defined physical schema; 
+in comparison CREATE TEMP TABLE must always specify a temporary schema as target and TTs always get 
+created in a special session-specific temporary schema.
+
+Additionally, TTs are exposed in information_schema and pg_catalog like regular tables, with their 
+temporary schema as parent namespace. TTs can also use persistent sequences in the same ways that 
+persistent tables can.
+
+### Temporary schemas
+TTs exist in a session-scoped temporary schema that gets automatically created the first time a TT 
+is created, and also gets dropped when the session terminates. 
+
+There is just one temporary schema defined per database and per session. Its name is auto-generated 
+based on the session ID. For example, session with ID 1231231312 will have 
+"pg_temp_1231231312" as its temporary schema name.
+
+Once the temporary schema exists, it is possible to refer to it explicitly when creating or using 
+tables:
+- `CREATE TEMP TABLE t(x INT)` is equivalent to `CREATE TEMP TABLE pg_temp_1231231312.t(x INT)` 
+and also `CREATE TABLE pg_temp_1231231312.t(x INT)` and also `CREATE TABLE pg_temp.t(x INT)`
+>Note that the last two equivalences are a reminder that the TEMP keyword is merely syntactic sugar 
+>for injecting the `pg_temp_<session_id>` namespace into name resolution instead of `public` when the name is unqualified;
+> conversely, the same mechanism is always used when the CREATE statement targets a temporary schema, 
+> regardless of whether the TEMP keyword is specified or not.
+- `SELECT * FROM t is equivalent to SELECT * FROM pg_temp_1231231312.t`
+(Although see section below about search_path)
+
+The temporary schema, when needed the first time, gets auto-created in the current database as 
+defined by the `database` session variable (and the head of search_path). If a client session 
+changes its current database and creates a temporary table, a new temporary schema with the 
+same name gets created in the new database. The temporary schema is thus defined per-database 
+and it is thus possible to have identically named temporary tables in different databases 
+in the same session.
+
+Sessions that do not use temporary tables do not see a temporary schema. 
+This provides a stronger guarantee of compatibility with extant CRDB clients that do 
+not know about temporary tables yet.
+
+### Name resolution lookup order
+CockroachDB already supports the name resolution rules defined by PostgreSQL.
+Generally:
+- Qualified object names get looked up in the namespace they specify
+- Non-qualified names get looked up in the order specified by search_path, with the same special 
+cases as PostgreSQL.
+- It's possible to list the temp schema name at an arbitrary position in search_path using the special 
+string "pg_temp" (even though the temp schema actually has a longer name).
+- If "pg_temp" is not listed in search_path, it is assumed to be in first position. This is why, 
+unless search_path is overridden, a TT takes priority over a PT with the same name.
+
+More details are given below in the "Reference level" section.
+
+### Compatibility with the SQL standard and PostgreSQL
+CockroachDB supports the PostgreSQL dialect and thus the PostgreSQL notion of what a TT should be. 
+The differences between PostgreSQL and standard SQL are detailed 
+[here](https://www.postgresql.org/docs/12/sql-createtable.html#SQL-CREATETABLE-COMPATIBILITY).
+
+At this point, CockroachDB will not support PostgreSQL's ON COMMIT clause to CREATE TEMP TABLE, 
+which defines transaction-scoped temp tables.
+
+
+# Reference-level explanation
+
+Ensuring no foreign key cross referencing is allowed between temporary/persistent tables should not be
+that hard to solve -- a boolean check at the point of establishment should suffice. 
+This requires adding an additional boolean flag to TableDescriptors that is set to true if 
+the table is temporary.
+
+DistSQL should “just work” with temporary tables as we pass table descriptors down to remote nodes.
+
+The major challenges involve name resolution of temporary tables and how deletion would occur. The high level approach for these bits is as follows:
+1. There needs to be a way to distinguish temporary table descriptors from persistent table descriptors 
+during name resolution -- In Postgres, every session is assigned a unique schema that scopes 
+temporary tables. Tables under this schema can only be accessed from the session that created the schema. 
+Currently, CockroachDB only supports the `public` physical schema. As part of this task, CRDB should 
+be extended to support other physical schemas (`pg_temp_<session id>`) under which temporary tables can live.
+2. Dropping temporary tables at the end of the session -- We can not rely on a node to clean up the 
+temporary tables’ data and table descriptors when a session exits. This is because there can be 
+failures that prevent the cleanup process to complete. We must run a background process that ensures 
+that cleanup happens by periodically checking for sessions that have already exited and had created 
+temporary tables.
+
+### Workflow
+
+Every session starts with 4 schemas (`public`, `crdb_internal`, `information_schema`, `pg_catalog`). 
+Users mainly interact with the `public` schema. Users only have `SELECT` privileges on the other three
+schemas.
+
+Envision the scenario where the user is interacting with the `movr` database and is connected
+to it on a session with sessionID 1231231312. At the start, the system.namespaces table will look
+like: 
+
+| parentID | name     | Id | parentSchemaID |
+|----------|----------|----|----------------|
+| 0        | movr     | 1  | 0              |
+| 1        | public   | 0  | 0              |
+| 1        | vehicles | 51 | 0              |
+
+Note that pg_temp_1231231312 does not exist yet, as no temporary tables have been created.
+
+When the user issues a command like `CREATE TEMP TABLE rides(x INT)` or `CREATE TABLE pg_temp.rides(x INT)`
+for the first time, we generate two new unique IDs that correspond to the schemaID and tableID. If 
+the generated IDs are 52 and 53 respectively, the following two entries will be added to system.namespace:
+
+| parentID | name               | Id | parentSchemaID |
+|----------|--------------------|----|----------------|
+| 1        | pg_temp_1231231312 | 52 | 0              |
+| 1        | rides              | 53 | 52             |
+
+Additionally, (1, pg_temp_1231231312, 0) -> 52 will be cached, so that subsequent lookups for 
+interaction with temporary tables do not require hitting the KV layer during resolution.
+This mapping can never change during the course of a session because the schema can not be renamed
+or dropped. Even if all temporary tables for a session are manually dropped, the schema is not. Thus,
+this cache is always consistent for a particular session.
+
+All subsequent TT commands have the following behavior. If the user runs  
+`CREATE TEMP TABLE users(x INT)`, we generate a new unique ID that corresponds to the tableID. Say
+this generated ID is 54, the following is added to the system.namespaces table:
+
+| parentID | name               | Id | parentSchemaID |
+|----------|--------------------|----|----------------|
+| 1        | users              | 54 | 52             |
+
+When the session ends, the system.namespace table returns to its initial state and the last three
+entries are removed. The data in the `users` and `rides` table is also deleted.
+
+### Name resolution rules (reference guide)
+CockroachDB already supports name resolution like PostgreSQL, as outlined in the name resolution 
+[RFC](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20180219_pg_virtual_namespacing.md):
+- Qualified object names get looked up in the namespace they specify
+- Non-qualified names get looked up in the order specified by search_path, with the same special 
+case as PostgreSQL:
+    - If search_path mentions pg_catalog explicitly, search_path is used as-is
+    - If search_path does not mention pg_catalog explicitly, then pg_catalog is assumed to be listed as the first entry in search_path.
+
+With temporary tables, another exception is introduced in the handling of search_path, which is detailed in depth in 
+[src/backend/catalog/namespace.c](https://github.com/postgres/postgres/blob/master/src/backend/catalog/namespace.c): 
+
+- If search_path mentions pg_temp explicitly, the search_path is used as-is.
+- If search_path does not mention pg_temp explicitly, then pg_temp is searched before pg_catalog and the explicit list.
+
+
+
+## Detailed design
+
+### Session Scoped Namespace
+
+Currently CockroachDB does name resolution by mapping (ParentID, ObjectName) -> ObjectID for all 
+objects in the database. This limits our ability to create temporary tables with the same name as 
+a persistent table or two temporary tables from different sessions that have the same name. 
+
+To remedy this:
+- System.namespace mapping is changed to (ParentID, ParentSchemaID, ObjectName) -> ObjectID
+- Name resolution for databases changes to (0, 0, Database name) -> DatabaseID.
+- Name resolution for schemas is introduced, as (Parent ID, 0, Schema Name) -> SchemaID.
+- Name resolution for tables changes to (ParentID, SchemaID, Table Name) -> TableDescriptorID
+- All temporary tables are placed under `pg_temp_<session_id>` namespace. As “pg_” prefixed names are 
+reserved in Postgres, it will be impossible for this schema name to conflict with a user defined 
+schema once CRDB has that support.
+- If a session tries to access a temporary table owned by another session, this can be caught during  
+name resolution as the schema name is constructed using the session. A session is only allowed to 
+access `pg_temp_<session_id>` and`public `physical schemas.
+
+To reduce the extra lookup for `public` and `pg_temp_<session_id>` schemaIDs, we cache the result after the first 
+lookup. As schemas can not be dropped or renamed during the session, this cache will always be 
+consistent.
+
+#### Migration:
+- For every DatabaseID that exists in the system.namespace table, a `public` schema is added by 
+adding an entry (DatabaseID, 0, public) -> 0.
+- For all existing tables, the schemaID field is prefilled with 0 to scope them under `public`.
+
+### Session Scoped Deletion
+There could be cases where a session terminates and is unable to perform clean up, for example when 
+a node goes down. We can not rely on a session to ensure that hanging data/table descriptors are 
+removed. Instead, we use a daemon process to perform cleanup.
+
+A background process finds all active sessions and filters through the system.namespace table to find 
+namespaces associated with sessions that have exited. The temporary table descriptors/table data are 
+then cleaned up by setting their TTL to 0 and going through the regular drop table process.
+
+As the namespace resolution only relies on the sessionID, we do not need to maintain any additional 
+data structure that keeps track of temporary table descriptors created by a session. For example, 
+say sessionID 123 goes offline. When the background process scans all the temporary schemas in 
+systems.namespace, it will realize that pg_temp_123 exists, but the session does not. 
+All temporary tables scoped under pg_temp_123 can then be safely deleted.
+
+## Rationale and Alternatives
+
+### Alternative A: Encode the SessionID in the metadataNameKey for Temporary Tables
+
+We can map temporary tables as (ParentID, TableName, SessionID) -> TableDescriptorID. 
+The mapping for persistent tables remains unchanged. 
+
+Temporary tables continue to live under the `public` physical schema, but to the user they appear 
+under a conceptual `pg_temp_<session_id>` schema. 
+
+When looking up tables, the physical schema accessor must try to do name resolution using both forms
+of keys (with and without SessionID), depending on the order specified in the search_path. If the 
+(conceptual) temporary schema is not present in the search_path, the first access must include the 
+ sessionID in the key. This ensures the expected name resolution semantics.
+
+The conceptual schema name must be generated on the fly for pg_catalog queries, by replacing `public`
+with `pg_temp_<session_id>` for table descriptors that describe temporary tables.
+
+As users are still allowed to reference tables using FQNs, this case needs to be specially checked 
+during name resolution -- a user should not be returned a temporary table if they specify 
+db.public.table_name. This needs special handling because the temporary schema is only conceptual 
+-- everything still lives under the `public` namespace. 
+
+#### Rationale
+
+- No need for an (easy) migration, but this approach offers a higher maintainability cost. 
+
+### Alternative B: In Memory Table Descriptors
+#### Some Key Observations:
+1. Temporary tables will never be accessed concurrently. 
+2. We do not need to pay the replication + deletion overhead for temporary table descriptors for no 
+added benefit. 
+> Note that this approach still involves persisting the actual data -- the only thing kept in memory
+> is the table descriptor.
+
+Instead of persisting table descriptors, we could simply store temporary table descriptors in the 
+TableCollection cache by adding a new field. All cached data in TableCollection is transaction scoped
+but temporary table descriptors must not be reset after transactions. As all name resolution hits 
+the cache before going to the KV layer, name resolution for temporary tables can be easily intercepted. 
+
+To provide session scoped deletion we must keep track of the tableIDs a particular session has 
+allocated. The current schema change code relies on actual Table Descriptors being passed to it to 
+do deletion, but we can bypass this and implement the bare bones required to delete the data ourselves.
+This would only require knowledge of the table IDs, which will have to be persisted.
+#### Rationale
+
+1. Temporary tables’ schemas can not be changed after creation, because schema changes 
+require physical table descriptors.
+2. Dependencies between temporary tables and a persistent sequence can not be allowed. There is no 
+way to reliably unlink these dependencies when the table is deleted without a table descriptor. 
+3. Debugging when a session dies unexpectedly will not be possible if we do not have access to the
+table descriptor. 
+
+
+
+## Unresolved questions
+
+#### Q1. Do we need to efficiently allocate temporary table IDs? 
+Currently, we do not keep track of which table IDs are in use and which ones have been deleted. 
+A table ID that has been deleted creates a “hole” in the ID range. As temporary tables are created 
+and deleted significantly more than regular tables, this problem will be exacerbated. Does this need
+to be solved? Are there any obvious downsides to having large numbers for tableIDs?
+
+This might be part of a larger discussion about ID allocation independent of temporary tables though.