Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backport-2.1: fix the handling of SQL types #29006

Merged
merged 29 commits into from
Aug 24, 2018

Conversation

knz
Copy link
Contributor

@knz knz commented Aug 23, 2018

Backport 1/1 commits from #28937.
Backport 1/1 commits from #28942.
Backport 17/17 commits from #28943.
Backport 1/1 commits from #28944.
Backport 1/1 commits from #28945.
Backport 1/1 commits from #28949.
Backport 2/2 commits from #28950.
Backport 1/1 commits from #28951 (#28959).
Backport 1/1 commits from #28952 (#28959).
Backport 1/1 commits from #28955 (#28959).
Backport 1/1 commits from #28959.
Backport 1/1 commits from #28959.
Backport 1/1 commits from #28690.

cc @cockroachdb/release

@knz knz requested a review from BramGruneir August 23, 2018 13:00
@knz knz requested a review from a team August 23, 2018 13:00
@knz knz requested a review from a team as a code owner August 23, 2018 13:00
@knz knz requested a review from a team August 23, 2018 13:00
@knz knz requested a review from a team as a code owner August 23, 2018 13:00
@knz knz requested review from a team August 23, 2018 13:00
@knz knz requested a review from a team as a code owner August 23, 2018 13:00
@knz knz requested review from a team August 23, 2018 13:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a ton to backport. I've reviewed the individual commits, but I have no idea how to review this.

I think the commits are good, just a question of if anyone else has any objections.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@knz
Copy link
Contributor Author

knz commented Aug 23, 2018

For reference there was zero merge conflict. So if you're good with the base PRs I'd say there's nothing special to see here.

Copy link
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@knz
Copy link
Contributor Author

knz commented Aug 23, 2018

bors r+

craig bot pushed a commit that referenced this pull request Aug 23, 2018
29006: backport-2.1: fix the handling of SQL types r=knz a=knz

Backport 1/1 commits from #28937.
Backport 1/1 commits from #28942.
Backport 17/17 commits from #28943.
Backport 1/1 commits from #28944.
Backport 1/1 commits from #28945.
Backport 1/1 commits from #28949.
Backport 2/2 commits from #28950.
Backport 1/1 commits from #28951 (#28959).
Backport 1/1 commits from #28952 (#28959).
Backport 1/1 commits from #28955 (#28959).
Backport 1/1 commits from #28959.
Backport 1/1 commits from #28959.
Backport 1/1 commits from #28690.

cc @cockroachdb/release 

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Aug 23, 2018

Build failed

The functions to retrieve the sizes of value encodings have not been
used in a long time. Remove them.

Release note: None
The `ColumnType` struct is not trivially small. Also calling a
by-pointer method on a copy may cause it to escape on the heap. Avoid
making copies until necessary.

Release note: None
These will be used to break up table.go in smaller pieces.

Release note: None
knz added 20 commits August 24, 2018 08:55
Since it's super small and used only in one place.

Release note: None
Release note: None
to EncodeDatumKeyAscending and EncodeDatumsKeyAscending, respectively.
Also, add comments and move them to column_type_encoding.go.

Release note: None
This was not well defined; the UNION code that required it really has
particular semantics and it is thus best defined close to the
implementation of UNION.

Release note: None
It is unneeded; we already have a perfectly useful `TypeName` map in
package `oid` which does the same.

Release note: None
Prior to this patch, CockroachDB incorrectly placed the "input syntax"
of each SQL type in the column `data_type` of
`information_schema.columns`.

The input syntax is the one reported in SHOW COLUMNS, SHOW CREATE
TABLE and other places, and is suitable to reproduce the exact type of
at able.

In contrast, `information_schema.columns.data_type` is constrained by
compatibility with third party tools and PostgreSQL clients.  It must
report the name of the type like PostgreSQL does, which in turn is
constrained by the SQL standard. A text column must be reported as
"text" not "string"; a decimal column as "numeric" not "decimal", a
float8 column as "double precision" not "float8", and so on.

By reporting the wrong string in that column CockroachDB is confusing
ORMs, which subsequently decide that the current on-disk type is not
the one expected by the app and then initiate a schema change (ALTER
COLUMN SET TYPE).

This patch corrects this incompatibility by introducing logic that
produces the proper information schema names for column types. This is
expected to reduce ORM complaints about insufficient support for ALTER
COLUMN SET TYPE (but will be tested/evaluated separately).

Release note (bug fix): CockroachDB now populates the `data_type`
column of `information_schema.columns` like PostgreSQL for
compatibility with 3rd party tools and ORMs.
There are major problems in `cockroach dump` described separately in
cockroachdb#28948.

Part of this is `cockroach dump` trying to reverse-engineering a datum
type from the announced column type in `information_schema.columns`,
by parsing the type name. Prior to this patch, this parsing was
incomplete (it went out of sync with the SQL parser over time,
naturally as it was bound to do).

This entire approach is flawed, but this is no time to redesign
`cockroach dump`. Instead, this patch re-syncs the dump-specific type
parser with the general SQL parser.

Release note (bug fix): `cockroach dump` is now again better able to
operate across multiple CockroachDB versions.
The distinction between "TEXT" and "STRING" inside CockroachDB is
inconsequential. Remove it. The data type is already properly reported
in pg_catalog and information_schema as "text".

Release note: None
This patch corrects two long-standing bugs in CockroachDB which were
confusing both 3rd party tools/ORMs and users:

- the SQL standard type called "CHAR" is supposed to have a default
  maximum width of 1 character.

  Prior to this patch, CockroachDB did not support this and instead
  set no default maximum width on CHAR. This would cause CockroachDB
  to fail validating the width of values inserted in tables, fail
  to constrain the width of values during casts to CHAR, and report
  incorrect information in introspection tables.

- PostgresSQL's dialect distinguishes the table column types TEXT,
  CHAR, VARCHAR and a special thing called `"char"` (this is a legacy
  PostgreSQL type which is, unfortunately, still used by some
  pg_catalog tables).

  Prior to this patch, CockroachDB would map all of these types into
  the same column type "STRING". In turn this caused them to show
  up as "text" in introspection.

  While this was not incorrect from the perspective of value
  handling (all these types are, indeed, handled with about the same
  value semantics in both PostgreSQL and CockroachDB), ORMs
  do pay attention to this distinction and become confused if they
  see a different type in introspection (e.g. "text") than what
  the application model requires (e.g. "char"). The result of this
  confusion was to trigger a schema change to adapt the type, which
  in turn fails due to missing features in ALTER COLUMN TYPE.

This patch corrects both problems by giving the appropriate default
width to CHAR and preserving the distinction between the various string
types in column descriptors and thus introspection.

Additionally, a width constraint check on collated string columns was
missing. This is now fixed too.

Tables and columns created prior to this patch are unaffected.

Release note (bug fix): CockroachDB now distinguishes "CHAR" and
"VARCHAR" as mandated by the SQL standard and PostgreSQL
compatibility. When a width is not specified (e.g. `CHAR(3)`), the
maximum width of `VARCHAR` remains unconstrained whereas the maximum
width for `CHAR` is now 1 character.

Release note (bug fix): CockroachDB now properly checks the width of
strings inserted in a collated string column with a specified width.

Release note (sql change): CockroachDB now preserves the distinction
between different column types for string values like in PostgreSQL,
for compatibility with 3rd party tools and ORMs.
The distinction between the type names "JSON" and "JSONB" is
inconsequential. The types are already reported properly in
introspection using the right alias. This patch removes this
distinction in code.

Release note: None
Prior to this patch the canonical name for the timestamptz type in
CockroachDB was "TIMESTAMP WITH TIME ZONE" with spaces and all
glorious 24 characters.

This is unfortunate because this is a pretty common type, and is
moreover used to disambiguate the return type of `now()` and thus will
be emitted pretty often in distsql physical plans. Having such a long
type name is unnecessary and indeed adds 50%+ storage, communication
and parsing overhead for every occurrence of this type in
distsql plans.

This patch shortens the canonical CockroachDB name to "TIMESTAMPTZ".

The representation of the type in introspection tables (pg_catalog,
information_schema) is unaffected.

Release note: None
It was not needed. This simplifies.

Release note: None
Prior to this patch, CockroachDB maintained an unnecessary distinction
between "INT" and "INTEGER", between "BIGINT" and "INT8", etc.

This distinction is unnecessary but also costly, as we were paying the
price of a "name" attribute in coltypes.TInt, with a string comparison
and hash table lookup on every use of the type.

What really matters is that the type shows up properly in
introspection; this has already been ensured by various
OID-to-pgcatalog mappings and the recently introduced
`InformationSchemaTypeName()`.

Any distinction beyond that is unnecessary and can be dropped from the
implementation.

Release note: None
@knz
Copy link
Contributor Author

knz commented Aug 24, 2018

Merge skew with an opt backport, rebased.

bors r+

craig bot pushed a commit that referenced this pull request Aug 24, 2018
29006: backport-2.1: fix the handling of SQL types r=knz a=knz

Backport 1/1 commits from #28937.
Backport 1/1 commits from #28942.
Backport 17/17 commits from #28943.
Backport 1/1 commits from #28944.
Backport 1/1 commits from #28945.
Backport 1/1 commits from #28949.
Backport 2/2 commits from #28950.
Backport 1/1 commits from #28951 (#28959).
Backport 1/1 commits from #28952 (#28959).
Backport 1/1 commits from #28955 (#28959).
Backport 1/1 commits from #28959.
Backport 1/1 commits from #28959.
Backport 1/1 commits from #28690.

cc @cockroachdb/release 

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Aug 24, 2018

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants