sql: sanitize the "no current db" story #24735

knz · 2018-04-12T17:33:54Z

Fixes #23893.
Fixes #23145.
Updates #24598.
Informs #24056.
Informs #23958.

Prior to this patch, CockroachdB made it excessively simple for client
connections to not have any current database set. Also by design
CockroachDB allows a connection to be opened with a non-existent
database set as current.

This in turn causes various problems related to name resolution and
the pg_catalog schema.

To understand these problems we can consider separately two groups of
users, which have separate use cases:

newcomers to CockroachDB that don't have prior experience with SQL
or PostgreSQL. These will not know about "databases" in any sense,
and will use CockroachDB with default options. This includes using
cockroach sql with no current database selected. These users have
to be taught upfront that they need to "create a database" and "set
the current database", which they may or may not (more likely not)
be interested in.

When these users then transition into writing apps, they will
copy-paste the conn URL printed by cockroach start (which sets no
current db) into some ORM or framework, and then will regretfully
observe that their app fail in mysterious ways because the current
db is not set (also see the next point).
users of existing applications or SQL drivers that have been
developed to target PostgreSQL. These apps commonly contain code
that:
- connect to a database named "postgres" by default, and make this
  default hard(er) to customize. For example Ecto (Elixir ORM)
  doesn't make this configurable.
  
  (From the PostgreSQL docs: "After initialization, a database
  cluster will contain a database named postgres, which is meant as
  a default database for use by utilities, users and third party
  applications. The database server itself does not require the
  postgres database to exist, but many external utility programs
  assume it exists.")
- (regardless of which db is selected as current), will issue some
  introspection queries using pg_catalog before any additional
  initialization is made by the higher-level app code, in particular
  before the app can issue CREATE DATABASE. Currently pg_catalog
  queries (and several other things) fail if the current database is
  unset or set to a non-existing database.

To address this relatively large class of problems, this patch
modifies CockroachDB as follows:

all clusters (existing or new) now get two empty databases on
startup (created by a migration) called defaultdb and postgres.
when a client doesn't specify a current db in their conn string
(i.e. in the pgwire options), the defaultdb database is picked up
instead.

This resolves all the known problems around the situations identified
above.

The two new databases behave like any regular database and are
provided for convenience to newcomers and
compatibility. (Administrators are free to drop them afterwards to
restore the previous situation, if that is actively desired.)

In addition, to make it slightly harder for newcomers to shoot
themselves in the foot, the database session variable cannot be set
to the empty string any more if sql_safe_updates is otherwise set.

Three alternatives to this approach were considered, discussed with
@jordanlewis and @nvanbenschoten in their quality of pg compatibility
experts, and with @bdarnell in his quality of carer for newcomers, and
ultimately rejected:

A) generate the current db upon connection if it does not exist, make
connections using the empty string as curdb use system instead.

Rejected because we don't like to auto-create upon connection. Too
easy to flood the system with bogus database names by typos in the
connection string.

B) Pre-populate clusters with a special anonymous database (db name =
empty string).

Rejected because there are too many areas in CockroachDB with a
risk to fail silently or with weird error messages as a
result. Also does not solve the problem of clients that want a db
called postgres.

C) Auto-create a database named after the user upon user creation
(with an optional flag to skip db creation).

Rejected because it does not solve the problem of clients that want
a db called postgres. @bdarnell: creating a database per user also
implicitly encourages bad habits (like sharing certs for the user
whose name corresponds to the database instead of creating multiple
users).

D) Implement some magic in the handling of pg_catalog, name
resolution and all other pieces that currently don't like non-existent
database values to create the appearance of something that works.

Rejected because too complex to implement successfully (there are
many moving parts that need to be touched to make this work) and
the resulting magic would be hard to understand by CockroachDB
contributors and hard to maintain over time.

E) Auto-create only one database called defaultdb.

Rejected because it does not solve the problem of clients that want
a db called postgres.

F) Auto-create only one database called postgres and use that as
default when no current db is specified.

Rejected because not good for branding.

G) Name the defaultdb either def, default or default_database.

The word "default" was rejected because it is a SQL keyword
and would cause queries using it to fail in obscure ways.

The word "def" (similar to what MySQL uses in
information_schema) was rejected because it refers too strongly
to other uses of this word in programming languages ("define").

"default_database" was rejected because too verbose.

Release note (sql change): new and existing clusters will now contain
two empty databases called defaultdb and postgres by default. The
database defaultdb is automatically used for clients that connect
without a current database set (e.g. without a database component in
the connection URL). The database postgres is provided for
compatibility with PostgreSQL client frameworks that require it to
exist when the database server has ben freshly installed. Both new
databases behave like any other regular database.

Release note (general change): existing clusters upgraded to this
version of CockroachDB will automatically see two new empty databases
named defaultdb and postgres. These were added for compatibility
with PostgreSQL and to ease adoption of CockroachDB. They can be
manually deleted by an administrator if the client applications are
determined not to require them.

cockroach-teamcity · 2018-04-12T17:34:03Z

This change is

knz · 2018-04-12T17:34:45Z

@benesch can we sit together to fix the test failures caused by the appearance of 2 non-system descriptors upon startup of test clusters. There are failures because we now have 2 more ranges than expected, and I don't know where to look.

benesch · 2018-04-12T20:21:47Z

@benesch can we sit together to fix the test failures caused by the appearance of 2 non-system descriptors upon startup of test clusters. There are failures because we now have 2 more ranges than expected, and I don't know where to look.

We sure can! But I thought it might be easier just to send you a patch:

diff --git a/pkg/server/testserver.go b/pkg/server/testserver.go
index 33f375b63..c83540c74 100644
--- a/pkg/server/testserver.go
+++ b/pkg/server/testserver.go
@@ -374,16 +374,29 @@ func ExpectedInitialRangeCount(db *client.DB) (int, error) {
 	if err != nil {
 		return 0, err
 	}
-	maxDescriptorID := descriptorIDs[len(descriptorIDs)-1]
 
 	// System table splits occur at every possible table boundary between the end
 	// of the system config ID space (keys.MaxSystemConfigDescID) and the system
-	// table with the maximum ID (maxDescriptorID), even when an ID within the
-	// span does not have an associated descriptor.
-	systemTableSplits := int(maxDescriptorID - keys.MaxSystemConfigDescID)
+	// table with the maximum ID (maxSystemDescriptorID), even when an ID within
+	// the span does not have an associated descriptor.
+	maxSystemDescriptorID := descriptorIDs[0]
+	for _, descID := range descriptorIDs {
+		if descID > maxSystemDescriptorID && descID <= keys.MaxReservedDescID {
+			maxSystemDescriptorID = descID
+		}
+	}
+	systemTableSplits := int(maxSystemDescriptorID - keys.MaxSystemConfigDescID)
+
+	// User table splits are analogous to system table splits: they occur at every
+	// possible table boundary between the end of the system ID space
+	// (keys.MaxReservedDescID) and the user table with the maximum ID
+	// (maxUserDescriptorID), even when an ID within the span does not have an
+	// associated descriptor.
+	maxUserDescriptorID := descriptorIDs[len(descriptorIDs)-1]
+	userTableSplits := int(maxUserDescriptorID - keys.MaxReservedDescID)
 
 	// `n` splits create `n+1` ranges.
-	return len(config.StaticSplits()) + systemTableSplits + 1, nil
+	return len(config.StaticSplits()) + systemTableSplits + userTableSplits + 1, nil
 }
 
 // WaitForInitialSplits waits for the server to complete its expected initial
diff --git a/pkg/sqlmigrations/migrations.go b/pkg/sqlmigrations/migrations.go
index c8c2195b6..4b998c11d 100644
--- a/pkg/sqlmigrations/migrations.go
+++ b/pkg/sqlmigrations/migrations.go
@@ -110,7 +110,7 @@ var backwardCompatibleMigrations = []migrationDescriptor{
 		// TODO(benesch): bake this migration into v2.1.
 		name:             "create system.table_statistics table",
 		workFn:           createTableStatisticsTable,
-		newDescriptorIDs: []sqlbase.ID{keys.TableStatisticsTableID},
+		newDescriptorIDs: staticIDs(keys.TableStatisticsTableID),
 	},
 	{
 		// Introduced in v2.0.
@@ -123,7 +123,7 @@ var backwardCompatibleMigrations = []migrationDescriptor{
 		// TODO(benesch): bake this migration into v2.1.
 		name:             "create system.locations table",
 		workFn:           createLocationsTable,
-		newDescriptorIDs: []sqlbase.ID{keys.LocationsTableID},
+		newDescriptorIDs: staticIDs(keys.LocationsTableID),
 	},
 	{
 		// Introduced in v2.0.
@@ -136,7 +136,7 @@ var backwardCompatibleMigrations = []migrationDescriptor{
 		// TODO(benesch): bake this migration into v2.1.
 		name:             "create system.role_members table",
 		workFn:           createRoleMembersTable,
-		newDescriptorIDs: []sqlbase.ID{keys.RoleMembersTableID},
+		newDescriptorIDs: staticIDs(keys.RoleMembersTableID),
 	},
 	{
 		// Introduced in v2.0.
@@ -188,12 +188,32 @@ var backwardCompatibleMigrations = []migrationDescriptor{
 		workFn: ensureMaxPrivileges,
 	},
 	{
-		// Introduced in v2.0.
-		name:   "create default databases",
-		workFn: createDefaultDbs,
+		// Introduced in v2.1.
+		// TODO(knz): bake this migration into v2.2.
+		name:             "create default databases",
+		workFn:           createDefaultDbs,
+		newDescriptorIDs: databaseIDs(sessiondata.DefaultDatabaseName, sessiondata.PgDatabaseName),
 	},
 }
 
+func staticIDs(ids ...sqlbase.ID) func(ctx context.Context, db db) ([]sqlbase.ID, error) {
+	return func(ctx context.Context, db db) ([]sqlbase.ID, error) { return ids, nil }
+}
+
+func databaseIDs(names ...string) func(ctx context.Context, db db) ([]sqlbase.ID, error) {
+	return func(ctx context.Context, db db) ([]sqlbase.ID, error) {
+		var ids []sqlbase.ID
+		for _, name := range names {
+			kv, err := db.Get(ctx, sqlbase.MakeNameMetadataKey(keys.RootNamespaceID, name))
+			if err != nil {
+				return nil, err
+			}
+			ids = append(ids, sqlbase.ID(kv.ValueInt()))
+		}
+		return ids, nil
+	}
+}
+
 // migrationDescriptor describes a single migration hook that's used to modify
 // some part of the cluster state when the CockroachDB version is upgraded.
 // See docs/RFCs/cluster_upgrade_tool.md for details.
@@ -205,10 +225,11 @@ type migrationDescriptor struct {
 	workFn func(context.Context, runner) error
 	// doesBackfill should be set to true if the migration triggers a backfill.
 	doesBackfill bool
-	// newDescriptorIDs lists the IDs of any additional descriptors added by this
-	// migration. This is needed to automate certain tests, which check the number
-	// of ranges/descriptors present on server bootup.
-	newDescriptorIDs sqlbase.IDs
+	// newDescriptorIDs is a function that returns the IDs of any additional
+	// descriptors that were added by this migration. This is needed to automate
+	// certain tests, which check the number of ranges/descriptors present on
+	// server bootup.
+	newDescriptorIDs func(ctx context.Context, db db) ([]sqlbase.ID, error)
 }
 
 func init() {
@@ -249,6 +270,7 @@ type leaseManager interface {
 // package.
 type db interface {
 	Scan(ctx context.Context, begin, end interface{}, maxRows int64) ([]client.KeyValue, error)
+	Get(ctx context.Context, key interface{}) (client.KeyValue, error)
 	Put(ctx context.Context, key, value interface{}) error
 	Txn(ctx context.Context, retryable func(ctx context.Context, txn *client.Txn) error) error
 }
@@ -302,8 +324,15 @@ func ExpectedDescriptorIDs(ctx context.Context, db db) (sqlbase.IDs, error) {
 	}
 	descriptorIDs := sqlbase.MakeMetadataSchema().DescriptorIDs()
 	for _, migration := range backwardCompatibleMigrations {
+		if migration.newDescriptorIDs == nil {
+			continue
+		}
 		if _, ok := completedMigrations[string(migrationKey(migration))]; ok {
-			descriptorIDs = append(descriptorIDs, migration.newDescriptorIDs...)
+			newIDs, err := migration.newDescriptorIDs(ctx, db)
+			if err != nil {
+				return nil, err
+			}
+			descriptorIDs = append(descriptorIDs, newIDs...)
 		}
 	}
 	sort.Sort(descriptorIDs)
diff --git a/pkg/sqlmigrations/migrations_test.go b/pkg/sqlmigrations/migrations_test.go
index c4d1762cc..eead20e61 100644
--- a/pkg/sqlmigrations/migrations_test.go
+++ b/pkg/sqlmigrations/migrations_test.go
@@ -119,6 +119,10 @@ func (f *fakeDB) Scan(
 	return results, nil
 }
 
+func (f *fakeDB) Get(ctx context.Context, key interface{}) (client.KeyValue, error) {
+	return client.KeyValue{}, errors.New("unimplemented")
+}
+
 func (f *fakeDB) Put(ctx context.Context, key, value interface{}) error {
 	if f.putErr != nil {
 		return f.putErr

knz · 2018-04-12T22:29:01Z

@benesch thank you! I needed to tweak it a little bit but it seems to work :)

knz · 2018-04-12T23:07:01Z

@bdarnell are we going to queue this for 2.0.x? I'll need to know to make the comment on the migration accurrate.

bdarnell · 2018-04-13T02:09:28Z

No, I think this is too big of a behavior change for 2.0.x.

knz · 2018-04-13T08:11:14Z

All right thanks.
The patch is now ready for review.

BramGruneir · 2018-05-10T14:38:49Z

With a number of small nits and it needs a few more tests.

Reviewed 17 of 24 files at r1, 29 of 29 files at r2.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.

pkg/sql/drop_test.go, line 502 at r2 (raw file):

	tableDesc := sqlbase.GetTableDescriptor(kvDB, "t", "kv")
	nameKey := sqlbase.MakeNameMetadataKey(keys.MaxReservedDescID+3, "kv")

Not a fan of magic numbers. Can you add a comment or make it a named const?

pkg/sql/schema_changer_test.go, line 82 at r2 (raw file):

	var lease sqlbase.TableDescriptor_SchemaChangeLease
	var id = sqlbase.ID(keys.MaxReservedDescID + 4)

Please add a named const or at least a comment. Why was this 2 before?

pkg/sql/zone_config_test.go, line 95 at r2 (raw file):

	s := srv.(*server.TestServer)

	expectedCounter := uint32(keys.MaxReservedDescID + 2)

Again, comment or const. Maybe a shared const since this seems very common.

pkg/sql/zone_test.go, line 72 at r2 (raw file):

	}
	dbRow := sqlutils.ZoneRow{
		ID:           keys.MaxReservedDescID + 3,

ditto

pkg/sql/logictest/testdata/logic_test/drop_database, line 1 at r2 (raw file):

# LogicTest: default

Please add a test that drops both of these new dbs.
And maybe even recreate the,.

pkg/sql/logictest/testdata/logic_test/grant_table, line 69 at r2 (raw file):

a         pg_catalog          NULL              readwrite  ALL
a         pg_catalog          NULL              root       ALL
a         public              NULL              admin      ALL

Not sure how to review this. Perhaps we should make this programmatic instead of a list? But that can be for another PR.

pkg/sql/logictest/testdata/logic_test/rename_database, line 1 at r2 (raw file):

# LogicTest: default parallel-stmts distsql

Please add a test that renames these new dbs.

pkg/sql/pgwire/pgwire_test.go, line 704 at r2 (raw file):

		},
		"SET CLUSTER SETTING cluster.organization = $1": {
			baseTest.SetArgs("hello world"),

Would it make sense to add some tests that execute the exact behaviour described in the commit message?

pkg/sql/sessiondata/search_path.go, line 22 at r2 (raw file):

// PgDatabaseName is the name of the default postgres system database.
const PgDatabaseName = "postgres"

If this DB are renamed, should the serachpath be updated too?

pkg/sql/sessiondata/search_path.go, line 26 at r2 (raw file):

// DefaultDatabaseName is the name ofthe default CockroachDB database used
// for connections without a current db set.
const DefaultDatabaseName = "default"

Should this be added to the searchpath?

Comments from Reviewable

knz · 2018-05-21T12:26:44Z

@BramGruneir thanks for pushing me to test the "default" database more thoroughly. It turns out that DEFAULT being a keyword in SQL, all kinds of things break in subtle and non-subtle ways when a query attempts to use this database name unquoted. I think this would cause too much confusion for users (And too much overhead in docs) so I am thinking of a new name for that database. Perhaps "def" as is done in MySQL.

knz · 2018-05-21T12:42:05Z

Review status: 14 of 45 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.

pkg/sql/pgwire/pgwire_test.go, line 704 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Would it make sense to add some tests that execute the exact behaviour described in the commit message?

What behavior are you referring to?

pkg/sql/sessiondata/search_path.go, line 22 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

If this DB are renamed, should the serachpath be updated too?

since 2.0 search path is about schemas, not databases.

pkg/sql/sessiondata/search_path.go, line 26 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Should this be added to the searchpath?

see comment above

Comments from Reviewable

knz · 2018-05-21T13:18:37Z

Review status: 9 of 46 files reviewed at latest revision, 10 unresolved discussions.

pkg/sql/logictest/testdata/logic_test/drop_database, line 1 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Please add a test that drops both of these new dbs.
And maybe even recreate the,.

Done.

pkg/sql/logictest/testdata/logic_test/grant_table, line 69 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Not sure how to review this. Perhaps we should make this programmatic instead of a list? But that can be for another PR.

Done.

pkg/sql/logictest/testdata/logic_test/rename_database, line 1 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Please add a test that renames these new dbs.

Done.

Comments from Reviewable

BramGruneir · 2018-05-21T15:30:52Z

Reviewed 37 of 38 files at r3.
Review status: all files reviewed at latest revision, 5 unresolved discussions.

pkg/sql/vars.go, line 142 at r3 (raw file):

			if len(dbName) == 0 && evalCtx.SessionData.SafeUpdates {
				return pgerror.NewDangerousStatementErrorf("SET database to empty string")

Is there a test for this?

pkg/sql/pgwire/pgwire_test.go, line 704 at r2 (raw file):

when a client doesn't specify a current db in their conn string
(i.e. in the pgwire options), the defaultdb database is picked up
instead.

Not sure if it's easy to test.

Comments from Reviewable

knz · 2018-05-21T15:58:58Z

Review status: 38 of 61 files reviewed at latest revision, 6 unresolved discussions.

pkg/sql/drop_test.go, line 502 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Not a fan of magic numbers. Can you add a comment or make it a named const?

Done.

pkg/sql/schema_changer_test.go, line 82 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Please add a named const or at least a comment. Why was this 2 before?

Done.

pkg/sql/zone_config_test.go, line 95 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Again, comment or const. Maybe a shared const since this seems very common.

Done.

pkg/sql/zone_test.go, line 72 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

ditto

Done.

pkg/sql/pgwire/pgwire_test.go, line 704 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

when a client doesn't specify a current db in their conn string
(i.e. in the pgwire options), the defaultdb database is picked up
instead.

Not sure if it's easy to test.

That is tested by the new tcl test. (cli/interactive_tests)

Comments from Reviewable

BramGruneir · 2018-05-21T16:48:06Z

Reviewed 23 of 23 files at r4.
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.

Comments from Reviewable

knz · 2018-05-22T13:22:34Z

Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.

pkg/sql/vars.go, line 142 at r3 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Is there a test for this?

Done. (logic test)

Comments from Reviewable

@jordanlewis

Prior to this patch, CockroachdB made it excessively simple for client connections to not have any current database set. Also by design CockroachDB allows a connection to be opened with a non-existent database set as current. This in turn causes various problems related to name resolution and the `pg_catalog` schema. To understand these problems we can consider separately two groups of users, which have separate use cases: - newcomers to CockroachDB that don't have prior experience with SQL or PostgreSQL. These will not know about "databases" in any sense, and will use CockroachDB with default options. This includes using `cockroach sql` with no current database selected. These users have to be taught upfront that they need to "create a database" and "set the current database", which they may or may not (more likely not) be interested in. When these users then transition into writing apps, they will copy-paste the conn URL printed by `cockroach start` (which sets no current db) into some ORM or framework, and then will regretfully observe that their app fail in mysterious ways because the current db is not set (also see the next point). - users of existing applications or SQL drivers that have been developed to target PostgreSQL. These apps commonly contain code that: - connect to a database named "`postgres`" by default, and make this default hard(er) to customize. For example Ecto (Elixir ORM) doesn't make this configurable. (From the PostgreSQL docs: "After initialization, a database cluster will contain a database named `postgres`, which is meant as a default database for use by utilities, users and third party applications. The database server itself does not require the postgres database to exist, but many external utility programs assume it exists.") - (regardless of which db is selected as current), will issue some introspection queries using `pg_catalog` before any additional initialization is made by the higher-level app code, in particular before the app can issue `CREATE DATABASE`. Currently `pg_catalog` queries (and several other things) fail if the current database is unset or set to a non-existing database. To address this relatively large class of problems, this patch modifies CockroachDB as follows: - all clusters (existing or new) now get two empty databases on startup (created by a migration) called `defaultdb` and `postgres`. - when a client doesn't specify a current db in their conn string (i.e. in the pgwire options), the `defaultdb` database is picked up instead. This resolves all the known problems around the situations identified above. The two new databases behave like any regular database and are provided for convenience to newcomers and compatibility. (Administrators are free to drop them afterwards to restore the previous situation, if that is actively desired.) In addition, to make it slightly harder for newcomers to shoot themselves in the foot, the `database` session variable cannot be set to the empty string any more if `sql_safe_updates` is otherwise set. Three alternatives to this approach were considered, discussed with @jordanlewis and @nvanbenschoten in their quality of pg compatibility experts, and with @bdarnell in his quality of carer for newcomers, and ultimately rejected: A) generate the current db upon connection if it does not exist, make connections using the empty string as curdb use `system` instead. Rejected because we don't like to auto-create upon connection. Too easy to flood the system with bogus database names by typos in the connection string. B) Pre-populate clusters with a special anonymous database (db name = empty string). Rejected because there are too many areas in CockroachDB with a risk to fail silently or with weird error messages as a result. Also does not solve the problem of clients that want a db called `postgres`. C) Auto-create a database named after the user upon user creation (with an optional flag to skip db creation). Rejected because it does not solve the problem of clients that want a db called `postgres`. @bdarnell: creating a database per user also implicitly encourages bad habits (like sharing certs for the user whose name corresponds to the database instead of creating multiple users). D) Implement some magic in the handling of `pg_catalog`, name resolution and all other pieces that currently don't like non-existent `database` values to create the appearance of something that works. Rejected because too complex to implement successfully (there are many moving parts that need to be touched to make this work) and the resulting magic would be hard to understand by CockroachDB contributors and hard to maintain over time. E) Auto-create only one database called `defaultdb`. Rejected because it does not solve the problem of clients that want a db called `postgres`. F) Auto-create only one database called `postgres` and use that as default when no current db is specified. Rejected because not good for branding. G) Name the `defaultdb` either `def`, `default` or `default_database`. The word "`default`" was rejected because it is a SQL keyword and would cause queries using it to fail in obscure ways. The word "`def`" (similar to what MySQL uses in `information_schema`) was rejected because it refers too strongly to other uses of this word in programming languages ("define"). "`default_database`" was rejected because too verbose. Release note (sql change): new and existing clusters will now contain two empty databases called `defaultdb` and `postgres` by default. The database `defaultdb` is automatically used for clients that connect without a current database set (e.g. without a database component in the connection URL). The database `postgres` is provided for compatibility with PostgreSQL client frameworks that require it to exist when the database server has ben freshly installed. Both new databases behave like any other regular database. Release note (general change): existing clusters upgraded to this version of CockroachDB will automatically see two new empty databases named `defaultdb` and `postgres`. These were added for compatibility with PostgreSQL and to ease adoption of CockroachDB. They can be manually deleted by an administrator if the client applications are determined not to require them. Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>

BramGruneir · 2018-05-22T14:55:21Z

Reviewed 8 of 8 files at r5.
Review status: all files reviewed at latest revision, all discussions resolved.

Comments from Reviewable

knz · 2018-05-22T15:01:47Z

TFYR!

bors r+

@jordanlewis

24735: sql: sanitize the "no current db" story r=knz a=knz Fixes #23893. Fixes #23145. Updates #24598. Informs #24056. Informs #23958. Prior to this patch, CockroachdB made it excessively simple for client connections to not have any current database set. Also by design CockroachDB allows a connection to be opened with a non-existent database set as current. This in turn causes various problems related to name resolution and the `pg_catalog` schema. To understand these problems we can consider separately two groups of users, which have separate use cases: - newcomers to CockroachDB that don't have prior experience with SQL or PostgreSQL. These will not know about "databases" in any sense, and will use CockroachDB with default options. This includes using `cockroach sql` with no current database selected. These users have to be taught upfront that they need to "create a database" and "set the current database", which they may or may not (more likely not) be interested in. When these users then transition into writing apps, they will copy-paste the conn URL printed by `cockroach start` (which sets no current db) into some ORM or framework, and then will regretfully observe that their app fail in mysterious ways because the current db is not set (also see the next point). - users of existing applications or SQL drivers that have been developed to target PostgreSQL. These apps commonly contain code that: - connect to a database named "`postgres`" by default, and make this default hard(er) to customize. For example Ecto (Elixir ORM) doesn't make this configurable. (From the PostgreSQL docs: "After initialization, a database cluster will contain a database named `postgres`, which is meant as a default database for use by utilities, users and third party applications. The database server itself does not require the postgres database to exist, but many external utility programs assume it exists.") - (regardless of which db is selected as current), will issue some introspection queries using `pg_catalog` before any additional initialization is made by the higher-level app code, in particular before the app can issue `CREATE DATABASE`. Currently `pg_catalog` queries (and several other things) fail if the current database is unset or set to a non-existing database. To address this relatively large class of problems, this patch modifies CockroachDB as follows: - all clusters (existing or new) now get two empty databases on startup (created by a migration) called `defaultdb` and `postgres`. - when a client doesn't specify a current db in their conn string (i.e. in the pgwire options), the `defaultdb` database is picked up instead. This resolves all the known problems around the situations identified above. The two new databases behave like any regular database and are provided for convenience to newcomers and compatibility. (Administrators are free to drop them afterwards to restore the previous situation, if that is actively desired.) In addition, to make it slightly harder for newcomers to shoot themselves in the foot, the `database` session variable cannot be set to the empty string any more if `sql_safe_updates` is otherwise set. Three alternatives to this approach were considered, discussed with @jordanlewis and @nvanbenschoten in their quality of pg compatibility experts, and with @bdarnell in his quality of carer for newcomers, and ultimately rejected: A) generate the current db upon connection if it does not exist, make connections using the empty string as curdb use `system` instead. Rejected because we don't like to auto-create upon connection. Too easy to flood the system with bogus database names by typos in the connection string. B) Pre-populate clusters with a special anonymous database (db name = empty string). Rejected because there are too many areas in CockroachDB with a risk to fail silently or with weird error messages as a result. Also does not solve the problem of clients that want a db called `postgres`. C) Auto-create a database named after the user upon user creation (with an optional flag to skip db creation). Rejected because it does not solve the problem of clients that want a db called `postgres`. @bdarnell: creating a database per user also implicitly encourages bad habits (like sharing certs for the user whose name corresponds to the database instead of creating multiple users). D) Implement some magic in the handling of `pg_catalog`, name resolution and all other pieces that currently don't like non-existent `database` values to create the appearance of something that works. Rejected because too complex to implement successfully (there are many moving parts that need to be touched to make this work) and the resulting magic would be hard to understand by CockroachDB contributors and hard to maintain over time. E) Auto-create only one database called `defaultdb`. Rejected because it does not solve the problem of clients that want a db called `postgres`. F) Auto-create only one database called `postgres` and use that as default when no current db is specified. Rejected because not good for branding. G) Name the `defaultdb` either `def`, `default` or `default_database`. The word "`default`" was rejected because it is a SQL keyword and would cause queries using it to fail in obscure ways. The word "`def`" (similar to what MySQL uses in `information_schema`) was rejected because it refers too strongly to other uses of this word in programming languages ("define"). "`default_database`" was rejected because too verbose. Release note (sql change): new and existing clusters will now contain two empty databases called `defaultdb` and `postgres` by default. The database `defaultdb` is automatically used for clients that connect without a current database set (e.g. without a database component in the connection URL). The database `postgres` is provided for compatibility with PostgreSQL client frameworks that require it to exist when the database server has ben freshly installed. Both new databases behave like any other regular database. Release note (general change): existing clusters upgraded to this version of CockroachDB will automatically see two new empty databases named `defaultdb` and `postgres`. These were added for compatibility with PostgreSQL and to ease adoption of CockroachDB. They can be manually deleted by an administrator if the client applications are determined not to require them. Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>

craig · 2018-05-22T15:28:18Z

Build succeeded

GitHub CI (Cockroach)

This skewed with cockroachdb#24735, which changed the output of information_schema when connecting to an implicit database. Release note: None

26044: misc: fix encryption stats nits. r=mberhault a=mberhault Nothing major, followups to premature merge of #25850 Release note: None 26048: tpccbench: fix existing dataset fast-path r=nvanbenschoten a=nvanbenschoten This skewed with #24735, which changed the output of `information_schema` when connecting to an implicit database. Release note: None Co-authored-by: marc <marc@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

knz requested review from jordanlewis, bdarnell and benesch April 12, 2018 17:33

knz requested a review from a team as a code owner April 12, 2018 17:33

knz requested review from a team April 12, 2018 17:33

knz force-pushed the 20180412-cur-db branch from 1474923 to 962f8f4 Compare April 12, 2018 22:03

bdarnell mentioned this pull request Apr 12, 2018

cli/sql: add an option to create a database upon starting the sql shell if it doesn't exist yet #7349

Closed

knz force-pushed the 20180412-cur-db branch from 962f8f4 to 503e128 Compare April 12, 2018 22:27

knz requested a review from a team April 12, 2018 22:27

knz force-pushed the 20180412-cur-db branch from 503e128 to 22d5836 Compare April 12, 2018 22:50

knz requested review from a team April 12, 2018 22:50

knz force-pushed the 20180412-cur-db branch from 22d5836 to 622b38f Compare April 12, 2018 23:06

knz force-pushed the 20180412-cur-db branch from 622b38f to f3074df Compare April 13, 2018 08:10

knz added the docs-todo label Apr 13, 2018

knz mentioned this pull request Apr 13, 2018

sql: errors in client due to use of pg_catalog with non-existent db as current #23893

Closed

knz force-pushed the 20180412-cur-db branch from f3074df to 0441fbd Compare April 13, 2018 08:19

Cikey mentioned this pull request May 5, 2018

ActiveRecord 5.1.5 rake db:test:prepare fails with search for postgres cockroachdb/activerecord-cockroachdb-adapter#26

Closed

rmloveland mentioned this pull request May 8, 2018

Document existence of default and postgres DBs, use by client connections cockroachdb/docs#3129

Closed

knz force-pushed the 20180412-cur-db branch from 1fe06d6 to 0c92b01 Compare May 21, 2018 12:01

knz requested a review from a team May 21, 2018 12:01

knz force-pushed the 20180412-cur-db branch 2 times, most recently from 7759b2f to e4723a9 Compare May 21, 2018 13:18

knz force-pushed the 20180412-cur-db branch from e4723a9 to 76f1c2b Compare May 21, 2018 15:58

knz force-pushed the 20180412-cur-db branch from 76f1c2b to d9bd1a0 Compare May 22, 2018 13:22

knz force-pushed the 20180412-cur-db branch from d9bd1a0 to cc7a9af Compare May 22, 2018 14:43

craig bot merged commit cc7a9af into cockroachdb:master May 22, 2018

nvanbenschoten mentioned this pull request May 24, 2018

tpccbench: fix existing dataset fast-path #26048

Merged

jseldess mentioned this pull request May 30, 2018

sql: sanitize the "no current db" story cockroachdb/docs#3206

Closed

knz deleted the 20180412-cur-db branch July 2, 2018 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: sanitize the "no current db" story #24735

sql: sanitize the "no current db" story #24735

knz commented Apr 12, 2018 •

edited

Loading

cockroach-teamcity commented Apr 12, 2018

knz commented Apr 12, 2018

benesch commented Apr 12, 2018

knz commented Apr 12, 2018

knz commented Apr 12, 2018

bdarnell commented Apr 13, 2018

knz commented Apr 13, 2018

BramGruneir commented May 10, 2018

knz commented May 21, 2018

knz commented May 21, 2018

knz commented May 21, 2018

BramGruneir commented May 21, 2018

knz commented May 21, 2018

BramGruneir commented May 21, 2018

knz commented May 22, 2018

BramGruneir commented May 22, 2018

knz commented May 22, 2018

craig bot commented May 22, 2018

sql: sanitize the "no current db" story #24735

sql: sanitize the "no current db" story #24735

Conversation

knz commented Apr 12, 2018 • edited Loading

cockroach-teamcity commented Apr 12, 2018

knz commented Apr 12, 2018

benesch commented Apr 12, 2018

knz commented Apr 12, 2018

knz commented Apr 12, 2018

bdarnell commented Apr 13, 2018

knz commented Apr 13, 2018

BramGruneir commented May 10, 2018

knz commented May 21, 2018

knz commented May 21, 2018

knz commented May 21, 2018

BramGruneir commented May 21, 2018

knz commented May 21, 2018

BramGruneir commented May 21, 2018

knz commented May 22, 2018

BramGruneir commented May 22, 2018

knz commented May 22, 2018

craig bot commented May 22, 2018

Build succeeded

knz commented Apr 12, 2018 •

edited

Loading