Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: make the CockroachDB integer types more compatible with postgres #26925

Open
knz opened this issue Jun 22, 2018 · 44 comments
Open

sql: make the CockroachDB integer types more compatible with postgres #26925

knz opened this issue Jun 22, 2018 · 44 comments
Labels
A-sql-datatypes SQL column types usable in table descriptors. A-sql-pgcompat Semantic compatibility with PostgreSQL C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) X-anchored-telemetry The issue number is anchored by telemetry references.

Comments

@knz
Copy link
Contributor

knz commented Jun 22, 2018

Type CockroachDB Postgres
int primitive type, 64-bit alias for int4, never shows up in vtables
int2 alias for int (thus wrong width), shows up as "int2" in certain vtables primitive type, 16-bit
int4 alias for int (thus wrong width), shows up as "int4" in certain vtables primitive type, 32-bit
int8 alias for int, shows up as "int8" in certain vtables primitive type, 64-bit
bigint alias for int, shows up as "bigint" in certain vtables alias for int8, never shows up in vtables
smallint alias for int (thus wrong width) shows up as "smallint" in certain vtables alias for int2, never shows up in vtables
serial special form for int (thus wrong width) with default unique_rowid() special form for int4, create sequence and default nextval(seqname)
bigserial alias for serial special form for int8, create sequence and default nextval(seqname)
smallserial alias for serial (thus wrong width) special form for int2, create sequence and default nextval(seqname)
bit near-alias for int, shows up as bit in certain vtables, max 64-bit primitive type, arbitrarily length byte array with bit input/output representations

Problems:

  • CockroachDB types int, int2, int4, serial, smallserial, bit have data width that are fully incompatible with postgresql
  • CockroachDB incorrectly preserves the names "bigint" and "smallint" whereas postgres desugars them to int8 and int2 during parsing.
  • PostgresDB serial types uses sequences, CockroachDB does not do this
  • bit has incorrectly a maximum size (and its underlying implementation is currently incorrect)

Fixing this would comprehensively adress #24062 #22607 #26730 #25098 #24686 and possibly others.

Informs #26128.

Jira issue: CRDB-4979

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-datatypes SQL column types usable in table descriptors. labels Jun 22, 2018
@knz
Copy link
Contributor Author

knz commented Jun 22, 2018

@awoods187 I'd like to work to address this over the summer (next milestone or the one after that). This would solve a large class of compatibility problems.

@benesch
Copy link
Contributor

benesch commented Jul 3, 2018

/cc @andy-kimball

@knz
Copy link
Contributor Author

knz commented Aug 7, 2018

  • PostgresSQL serial types uses sequences, CockroachDB does not do this - planned for 2.1
  • CockroachDB incorrectly preserves the names "bigint" and "smallint" whereas postgres desugars them to int8 and int2 during parsing. planned for 2.1
  • bit has incorrectly a maximum size (and its underlying implementation is currently incorrect) planned in the 2.2 time frame, but may be considered as bugfix and backported in 2.1 (with doc known-limitation at the 2.1 release)
  • CockroachDB types int, int2, int4, serial, smallserial, bit have data width that are fully incompatible with postgresql -- actually that's 2 different issues
    • the visible data width, announced in the introspection tables (information_schema, pg_Catalog), is wrong - we can fix this in 2.1 by lying to clients and make them believe what they need to believe
    • the actual data width, which restricts the set of possible values for given data types, that becomes known limitation in 2.1 + planned for fix in 2.2

craig bot pushed a commit that referenced this issue Aug 23, 2018
28945: sql: fix the reporting of types in information_schema.columns r=knz a=knz

First commits from #28944 and priors.
Forked off #28690.
Fixes #27601.
Largely addresses the concerns that led to issue #26925.

Prior to this patch, CockroachDB incorrectly placed the "input syntax"
of each SQL type in the column `data_type` of
`information_schema.columns`.

The input syntax is the one reported in SHOW COLUMNS, SHOW CREATE
TABLE and other places, and is suitable to reproduce the exact type of
at able.

In contrast, `information_schema.columns.data_type` is constrained by
compatibility with third party tools and PostgreSQL clients.  It must
report the name of the type like PostgreSQL does, which in turn is
constrained by the SQL standard. A text column must be reported as
"text" not "string"; a decimal column as "numeric" not "decimal", a
float8 column as "double precision" not "float8", and so on.

By reporting the wrong string in that column CockroachDB is confusing
ORMs, which subsequently decide that the current on-disk type is not
the one expected by the app and then initiate a schema change (ALTER
COLUMN SET TYPE).

This patch corrects this incompatibility by introducing logic that
produces the proper information schema names for column types. This is
expected to reduce ORM complaints about insufficient support for ALTER
COLUMN SET TYPE (but will be tested/evaluated separately).

Release note (bug fix): CockroachDB now populates the `data_type`
column of `information_schema.columns` like PostgreSQL for
compatibility with 3rd party tools and ORMs.

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Aug 23, 2018
28690: sql: fix the handling of integer types r=knz a=knz

Addresses a large chunk of #26925.
Fixes #25098.
Informs #24686.

Prior to this patch, CockroachDB maintained an unnecessary distinction
between "INT" and "INTEGER", between "BIGINT" and "INT8", etc.

This distinction is unnecessary but also costly, as we were paying the
price of a "name" attribute in coltypes.TInt, with a string comparison
and hash table lookup on every use of the type.

What really matters is that the type shows up properly in
introspection; this has already been ensured by various
OID-to-pgcatalog mappings and the recently introduced
`InformationSchemaTypeName()`.

Any distinction beyond that is unnecessary and can be dropped from the
implementation.

Release note: None


Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
@bobvawter bobvawter added this to the 2.2 milestone Oct 30, 2018
@bobvawter bobvawter self-assigned this Oct 30, 2018
@bobvawter
Copy link
Contributor

bobvawter commented Nov 8, 2018

Here's a micro-RFC that I'd like to get feedback on before starting on the implementation.

Updated 2018-12-05

Motivation:

We currently assume that INT really means INT8, however this is inconsistent with PostgreSQL and various (Java) ORMs that assume INT means INT4. We want a transition path such that INT means INT4 in CockroachDB.

Goals

  • Code that works correctly with INT --> INT8 on 2.1 will work correctly on 2.2 and also on 2.Next with only a flag change.
  • Existing database schemas will not change, but it's fine to tweak their introspection metadata to accurately reflect their configuration.
  • Handle the INT --> INTx mapping as close to parse time as possible to avoid needing to plumb type-mapping logic down into all of the various ColumnType support functions or schema introspection tables.
  • Present identical data and table-introspection information to all callers, regardless of wether or not they assume 4- or 8-byte INT values to minimize cognitive load and change-sprawl.

Plan

  • [2.2] PR sql: Make INT an alias for INT8 #32831 makes INT an alias for INT8
  • [2.2] PR sql: Add default_int_size to control INT alias #32848 add as cluster and session setting default_int_size whose valid values are 8 (the default) and 4.
  • [2.2] Ensure that all parsing of type INT or INTEGER reflects the default_int_size setting by substituting INT4 or INT8 as appropriate.
    • Exception: When a column's default is set to exactly unique_rowid(), it should be an INT8.
  • [2.2] Add a cluster-version ratchet which will upgrade all existing TableDescriptors to ensure that any SemanticType=INT columns with unspecified Width are reset to Width=64.
    • This won't be necessary since all user inputs where INT := INT4 will simply have INT4 in the parse tree. Other SQL not coming in from user input will continue to use INT := INT8 to preserve the historical behavior.
  • [2.2] Bugfix sql: Alias Type Information is not propagated to pgwire or pg_attribute #16769: pg_attribute should return correct atttypid values, instead of always returning 20 (INT8) for all int types.
  • [2.2] Bugfix: information_schema.columns to remove character_maximum_length for int types, which pg leaves unpopulated. The docs on ColumnType.MaxCharacterLength() are inconsistent with the implementation.
  • [2.2] Test: The existing information_schema.column.crdb_sql_type column should return pre-2.2 values until the INT --> INT8 upgrade ratchet has fired.
  • [2.2] Test: Update the information_schema and pg_class logic tests to match pgsql for e.g. create table x(a int, aa integer, b int4, c int8, d bigint, e int2, f smallint); when default_int_size=4 and document differences when in 8-byte mode.
  • [2.2] Test: Ensure that cast operations '1'::INT result in the correct data width.
  • [2.2] Telemetry: Record how often default_int_size is set.
  • [2.Next] Seeing an Int-type ColumnType with Width == 0 is an error (e.g. types.go: FromColumnType().
  • [2.Next] Switch default_int_size to 4.
  • [2.Next] Remove the unnecessary ColumnType.VisibleType values, since we can always back them out of the non-zero `ColumnType.Width.
  • [2.Future] Remove default_int_size settings.

Mixed-version notes

There shouldn't be any need to alter the ColumnType message. The only observable change for a 2.1 node is that 2.2 gateways would never create a ColumnType { SemanticType: INT, Width: 0 }; the width would always be set to some explicit value.

If a column is created on a 2.1 gateway in the mixed-mode case, the upgrade ratchet will update the TableDescriptor to act as though it had been created with INT --> INT8 semantics on a 2.2 gateway.

@bdarnell
Copy link
Contributor

bdarnell commented Nov 9, 2018

LGTM

Exception: When a column's default is set to an INT8-returning function, e.g. unique_rowid(), it should be an INT8.

When this rule is invoked we should emit a warning.

@knz
Copy link
Contributor Author

knz commented Nov 12, 2018

I recommend you file separate issues linked to this one for the various required bug fixes.

Exception: When a column's default is set to an INT8-returning function, e.g. unique_rowid(), it should be an INT8.

I would restrict that to the expression unique_rowid() specifically, not just any INT expression.

LGTM otherwise! Thanks for the analysis.

@knz
Copy link
Contributor Author

knz commented Nov 12, 2018

@bdarnell

When this rule is invoked we should emit a warning.

We don't have infrastructure for pgwire warnings yet.

Personally for UX I would prefer introducing table/column comments (#19472) and put the warning in a comment instead.

@nocduro
Copy link

nocduro commented Nov 23, 2018

Hello, thought I would chime in with my experiences with this as a new cockroachdb user.

I was trying to experiment with cockroachdb instead of postgres with a webapp I'm making, and ran into the problem of INT not being equivalent to INT4, specifically in the postgres SERIAL type.

The app is written with Rust, and using the diesel ORM so the types are pretty strictly enforced.

This seems like a very common setup in postgres/sql so would make migration easier...
Here's a search on github showing over 200k instances of this:

https://github.com/search?q=%22+SERIAL+PRIMARY+KEY%22&type=Code

I'm wondering if there will ever be a way to have a SERIAL column in cockroachdb that is an INT4? Seems weird to advertise postgres compatibility when this isn't supported.

edit: my post came off a bit too negative. Now that I think about this some more it isn't too bad. I would just have to migrate my db to use bigserial and change my app code from i32 to i64 in a couple places. The downsides of that are that the ids in my urls are a bit longer (not that big of deal), and that I 'waste' 4 extra bytes per entry. I have fewer than a million rows so that change is pretty insignificant.

edit2: ran into another problem now. My ORM assumes that the returned value is the width of the data type. So i have an INT4 stored and it gets returned to me from cockroach as an INT8... Erasing some type data I had, and taking up more space. I don't need 64 bits to represent a number between 1-100

@knz
Copy link
Contributor Author

knz commented Nov 23, 2018

@nocduro we are working on it.

In the meantime, please know that we have introduced a partial compatibility mode in CockroachDB, so you can use INT4 with a sequence for SERIAL. See for yourself:

root@127.24.137.227:42763/defaultdb> set experimental_serial_normalization = sql_sequence;
SET

root@127.24.137.227:42763/defaultdb> create table t(x serial4);
CREATE TABLE


root@127.24.137.227:42763/defaultdb> show create t;
  table_name |                     create_statement
+------------+-----------------------------------------------------------+
  t          | CREATE TABLE t (
             |     x INT4 NOT NULL DEFAULT nextval('t_x_seq1':::STRING),
             |     FAMILY "primary" (x, rowid)
             | )
(1 row)

@knz
Copy link
Contributor Author

knz commented Nov 23, 2018

The gist of this is that you need 1) to set experimental_serial_normalization and 2) use INT4, SERIAL4 etc explicitly (the word "INT" without a size will still automatically translate to INT8).

Regarding your final note:

edit2: ran into another problem now. My ORM assumes that the returned value is the width of the data type. So i have an INT4 stored and it gets returned to me from cockroach as an INT8...

Please provide more information. If this is a bug we need to fix it.

@bobvawter
Copy link
Contributor

bobvawter commented Nov 27, 2018

Status update:

Right now, I’ve explored three different ways of getting this INT change in:

  1. Plumb EvalContext into everywhere that INT-as-a-type gets used:
  • cfd53f6...bobvawter:int4
  • I'm running into what appears to be type "erasure" in opt when converting from coltypes.T -> types.T -> coltypes.T
  • This manifests in tests like SELECT a::INT FROM (SELECT '4294967296' as a) when in INT4-mode. The optimizer unwinds this to a DInt and we lose the cast to what should be INT4 and a range-check error.
  1. Tweak the parser to never kick out an INT into our AST
  • 9363293...bobvawter:int4-2
  • This approach adds a field to our Scanner type, which implements sqlLexer. The sql.y grammar is changed so that receiving an INT token delegate to the Scanner.defaultIntType field.
  • Parsing of SQL statements is performed in conn which adds the generated AST to a buffer to be drained by connExecutor. I was running into problems finding a path from conn to the SessionData owned by connExecutor.
  1. If we can't tweak the parser, then we ought to be able to modify the incoming AST in connExecutor
  • The tree.Statement and tree.Expr types are currently hit-or-miss in terms of being able to run a visitor over them.
  • It's a huge surface area to manually maintain the traversal code.
  • I'm sketching out an idea for what a code-generated visitor API could look like here: https://github.com/cockroachdb/cockroach/pull/32628/files

Based on a chat yesterday with @jordanlewis, I think option 2 is once again viable since we have identified a way to get to the SessionData from conn.

Regardless of implementation complexity, option 2 would be the obvious approach if we could make a knife-edge cutover: We want to make INT behave like FLOAT in which we assign an explicit size to the type whenever you use it.

Open question:

  • There are many places in the code where SQL fragments get re-parsed (e.g. CHECK statements, computed column expressions). Presumably, this new behavior should be applied only to statements coming in from the user. When it's time to do the cleanup migration, having a more robust visitor API could help make that much more fool-proof to implement.

@andy-kimball
Copy link
Contributor

Could we leave the cluster default at int8 if it's an existing cluster, but default to int4 for new clusters? i.e. treat it as a cluster setting rather than a session setting

@bdarnell
Copy link
Contributor

Could we leave the cluster default at int8 if it's an existing cluster, but default to int4 for new clusters? i.e. treat it as a cluster setting rather than a session setting

It's actually both a cluster and a session setting (the cluster setting is used as the default for the session setting), so that's exactly what would happen if we made this change. Existing clusters would continue to use integer=int8 while new ones would use integer=int4. So we wouldn't break any existing clusters, but cause a divergence between older prod clusters and newer dev/test clusters.

@jordanlewis
Copy link
Member

Another potentially scary thing is integer tables in a cluster that got upgraded from 19.1 to 19.2 would have a different width than integer tables that got created in the same cluster after the upgrade was finalized. I think that it could be rather confusing for applications.

I still think that making this change would be more disruptive than helpful. Note that I'm assuming that drivers work properly because of the OID change I mentioned above. I haven't seen any evidence that refutes that assumption - if there were such evidence I think I'd need to reconsider.

jordanlewis added a commit to jordanlewis/cockroach that referenced this issue Oct 2, 2019
The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: cockroachdb#5807   (sql: Add support for TEMP tables)
 151: cockroachdb#17511  (sql: support stored procedures)
  86: cockroachdb#26097  (sql: make TIMETZ more pg-compatible)
  56: cockroachdb#10735  (sql: support SQL savepoints)
  55: cockroachdb#32552  (multi-dim arrays)
  55: cockroachdb#26508  (sql: restricted DDL / DML inside transactions)
  52: cockroachdb#32565  (sql: support optional TIME precision)
  39: cockroachdb#243    (roadmap: Blob storage)
  33: cockroachdb#26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: cockroachdb#27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: cockroachdb#12123  (sql: Can't drop and replace a table within a transaction)
  24: cockroachdb#26443  (sql: support user-defined schemas between database and table)
  20: cockroachdb#21286  (sql: Add support for geometric types)
  18: cockroachdb#6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: cockroachdb#22329  (Support XA distributed transactions in CockroachDB)
  16: cockroachdb#24062  (sql: 32 bit SERIAL type)
  16: cockroachdb#30352  (roadmap:when CockroachDB  will support cursor?)
  12: cockroachdb#27791  (sql: support RANGE types)
   8: cockroachdb#40195  (pgwire: multiple active result sets (portals) not supported)
   8: cockroachdb#6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: cockroachdb#23468  (sql: support sql arrays of JSONB)
   5: cockroachdb#40854  (sql: set application_name from connection string)
   4: cockroachdb#35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: cockroachdb#32610  (sql: can't insert self reference)
   4: cockroachdb#40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: cockroachdb#35897  (sql: unknown function: pg_terminate_backend())
   4: cockroachdb#4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: cockroachdb#27796  (sql: support user-defined DOMAIN types)
   3: cockroachdb#3781   (sql: Add Data Type Formatting Functions)
   3: cockroachdb#40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: cockroachdb#35882  (sql: support other character sets)
   2: cockroachdb#10028  (sql: Support view queries with star expansions)
   2: cockroachdb#35807  (sql: INTERVAL output doesn't match PG)
   2: cockroachdb#35902  (sql: large object support)
   2: cockroachdb#40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: cockroachdb#18846  (sql: Support CIDR column type)
   1: cockroachdb#9682   (sql: implement computed indexes)
   1: cockroachdb#31632  (sql: FK options (deferrable, etc))
   1: cockroachdb#24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: cockroachdb#36215  (sql: enable setting standard_conforming_strings to off)
   1: cockroachdb#32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: cockroachdb#36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: cockroachdb#26732  (sql: support the binary operator: <int> / <float>)
   1: cockroachdb#23299  (sql: support coercing string literals to arrays)
   1: cockroachdb#36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: cockroachdb#26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: cockroachdb#21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: cockroachdb#36179  (sql: implicity convert date to timestamp)
   1: cockroachdb#36118  (sql: Cannot parse '24:00' as type time)
   1: cockroachdb#31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None
@awoods187
Copy link
Contributor

I think we should consider defaulting to int4 as the vast majority of our users are still in front of us. We should have a bias towards compatibility and toward forward-looking as long as we have escape hatches for users upgrading.

@jordanlewis
Copy link
Member

I still haven't found a compelling, real user problem caused by the mapping of integer to int8 instead of int4 by default. The real compatibility bugs that were discussed in this thread are solved now, as far as I know, but I'd love to see evidence to the contrary.

@bdarnell
Copy link
Contributor

bdarnell commented Oct 4, 2019

Concrete example: the migration tool in knex (a javascript query builder). It creates a table with an INTEGER column to track its state. Since javascript doesn't have a 64-bit integer type, the pg driver converts INT8 values into decimal strings, but knex expects javascript number objects. It is therefore impossible to use the knex migration tool out of the box unless you set the cluster setting to make integer=int4.

This isn't a clear-cut case - the cluster setting workaround works, and even that could be avoided if the tool were changed to specify a smaller integer type (but this is tricky and DB-specific - int4 isn't available everywhere). The SQL standard doesn't specify the size of the standard types SMALLINT, INTEGER, and BIGINT, but JDBC gives us a de facto standard: SMALLINT is 16 bits, INTEGER is 32, and BIGINT is 64. Doing anything else will be an ongoing source of friction since anyone who uses java or JS will need to either change their schemas or set this cluster/session setting. And as Andy says, the majority of our usage is still in front of us.

@jordanlewis
Copy link
Member

Okay, that's compelling enough.

Note that it wouldn't make sense to change integer to int4 without also changing serial to serial4 (and therefore defaulting experimental_serial_normalization to sql_sequence). That has a big performance impact, and I'm more scared of making this change than just changing the integer type.

Why doesn't it makes sense to change just integer?

@bdarnell
Copy link
Contributor

bdarnell commented Oct 4, 2019

This example used an INTEGER, but SERIAL is also very common and raises exactly the same issues. Changing one but not the other means that we cause the backwards-compatibility headaches for existing CockroachDB users, but we also have to continue to recommend the cluster/session settings for future users. I think if we're going to make a backwards-incompatible change for the sake of better compatibility with other databases, we should go all the way and change both types.

jordanlewis added a commit to jordanlewis/cockroach that referenced this issue Oct 24, 2019
The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: cockroachdb#5807   (sql: Add support for TEMP tables)
 151: cockroachdb#17511  (sql: support stored procedures)
  86: cockroachdb#26097  (sql: make TIMETZ more pg-compatible)
  56: cockroachdb#10735  (sql: support SQL savepoints)
  55: cockroachdb#32552  (multi-dim arrays)
  55: cockroachdb#26508  (sql: restricted DDL / DML inside transactions)
  52: cockroachdb#32565  (sql: support optional TIME precision)
  39: cockroachdb#243    (roadmap: Blob storage)
  33: cockroachdb#26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: cockroachdb#27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: cockroachdb#12123  (sql: Can't drop and replace a table within a transaction)
  24: cockroachdb#26443  (sql: support user-defined schemas between database and table)
  20: cockroachdb#21286  (sql: Add support for geometric types)
  18: cockroachdb#6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: cockroachdb#22329  (Support XA distributed transactions in CockroachDB)
  16: cockroachdb#24062  (sql: 32 bit SERIAL type)
  16: cockroachdb#30352  (roadmap:when CockroachDB  will support cursor?)
  12: cockroachdb#27791  (sql: support RANGE types)
   8: cockroachdb#40195  (pgwire: multiple active result sets (portals) not supported)
   8: cockroachdb#6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: cockroachdb#23468  (sql: support sql arrays of JSONB)
   5: cockroachdb#40854  (sql: set application_name from connection string)
   4: cockroachdb#35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: cockroachdb#32610  (sql: can't insert self reference)
   4: cockroachdb#40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: cockroachdb#35897  (sql: unknown function: pg_terminate_backend())
   4: cockroachdb#4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: cockroachdb#27796  (sql: support user-defined DOMAIN types)
   3: cockroachdb#3781   (sql: Add Data Type Formatting Functions)
   3: cockroachdb#40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: cockroachdb#35882  (sql: support other character sets)
   2: cockroachdb#10028  (sql: Support view queries with star expansions)
   2: cockroachdb#35807  (sql: INTERVAL output doesn't match PG)
   2: cockroachdb#35902  (sql: large object support)
   2: cockroachdb#40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: cockroachdb#18846  (sql: Support CIDR column type)
   1: cockroachdb#9682   (sql: implement computed indexes)
   1: cockroachdb#31632  (sql: FK options (deferrable, etc))
   1: cockroachdb#24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: cockroachdb#36215  (sql: enable setting standard_conforming_strings to off)
   1: cockroachdb#32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: cockroachdb#36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: cockroachdb#26732  (sql: support the binary operator: <int> / <float>)
   1: cockroachdb#23299  (sql: support coercing string literals to arrays)
   1: cockroachdb#36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: cockroachdb#26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: cockroachdb#21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: cockroachdb#36179  (sql: implicity convert date to timestamp)
   1: cockroachdb#36118  (sql: Cannot parse '24:00' as type time)
   1: cockroachdb#31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None
craig bot pushed a commit that referenced this issue Nov 7, 2019
41252: roachtest: add test that aggregates orm blacklist failures r=jordanlewis a=jordanlewis

The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: #5807   (sql: Add support for TEMP tables)
 151: #17511  (sql: support stored procedures)
  86: #26097  (sql: make TIMETZ more pg-compatible)
  56: #10735  (sql: support SQL savepoints)
  55: #32552  (multi-dim arrays)
  55: #26508  (sql: restricted DDL / DML inside transactions)
  52: #32565  (sql: support optional TIME precision)
  39: #243    (roadmap: Blob storage)
  33: #26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: #27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: #12123  (sql: Can't drop and replace a table within a transaction)
  24: #26443  (sql: support user-defined schemas between database and table)
  20: #21286  (sql: Add support for geometric types)
  18: #6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: #22329  (Support XA distributed transactions in CockroachDB)
  16: #24062  (sql: 32 bit SERIAL type)
  16: #30352  (roadmap:when CockroachDB  will support cursor?)
  12: #27791  (sql: support RANGE types)
   8: #40195  (pgwire: multiple active result sets (portals) not supported)
   8: #6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: #23468  (sql: support sql arrays of JSONB)
   5: #40854  (sql: set application_name from connection string)
   4: #35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: #32610  (sql: can't insert self reference)
   4: #40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: #35897  (sql: unknown function: pg_terminate_backend())
   4: #4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: #27796  (sql: support user-defined DOMAIN types)
   3: #3781   (sql: Add Data Type Formatting Functions)
   3: #40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: #35882  (sql: support other character sets)
   2: #10028  (sql: Support view queries with star expansions)
   2: #35807  (sql: INTERVAL output doesn't match PG)
   2: #35902  (sql: large object support)
   2: #40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: #18846  (sql: Support CIDR column type)
   1: #9682   (sql: implement computed indexes)
   1: #31632  (sql: FK options (deferrable, etc))
   1: #24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: #36215  (sql: enable setting standard_conforming_strings to off)
   1: #32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: #36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: #26732  (sql: support the binary operator: <int> / <float>)
   1: #23299  (sql: support coercing string literals to arrays)
   1: #36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: #26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: #21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: #36179  (sql: implicity convert date to timestamp)
   1: #36118  (sql: Cannot parse '24:00' as type time)
   1: #31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
@bobvawter bobvawter removed their assignment May 27, 2020
@ajwerner
Copy link
Contributor

@rafiss @jordanlewis should this be tracked in the @cockroachdb/sql-experience board?

@ajwerner ajwerner removed this from the 19.1 milestone Aug 16, 2021
@blathers-crl blathers-crl bot added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Aug 16, 2021
@ajwerner
Copy link
Contributor

ajwerner commented Feb 2, 2022

We definitely shouldn't go and change the width of integers on already defined table data. If we retain the current default value for upgraded clusters and bootstrap new clusters with the new default, I think that could go a long way towards achieving the goals and not breaking too much backwards compat. I vote we go with sql_sequence_cached on the serial normalization. On the whole, I support the movement and does make sense for compat, but we should retain our escape hatch.

@petermattis
Copy link
Collaborator

Something to be aware of with regards to compatibility (both here and elsewhere): we don't just need to be backward compatible on existing clusters, but also for new clusters that are expected to have the same behavior as an existing cluster. Consider a common setup: a production cluster, a staging cluster, and N development clusters running locally on engineer's workstations. The production cluster is likely to have been created a long time ago and upgraded through many major versions. Similar story for the staging cluster, though it may get wiped and recreated periodically if something goes horrifically wrong. The local development clusters will get wiped frequently. And if engineer's didn't wipe their development clusters frequently, new engineers would create new local development clusters which have definitely not gone through the same upgrade path as the clusters of other engineers. This isn't a theoretical problem. The difference in the history of clusters has caused migrations on the CC control plane DB to behave differently.

@beikov
Copy link

beikov commented Sep 15, 2022

I didn't read the whole discussion but just wanted to say that through JDBC, when asking for the column metadata of a integer array column, it will report the column type _int8 whereas it should report _int4. Kind of annoying in Hibernate tests that rely on proper types being reported, but no road blocker.

@knz
Copy link
Contributor Author

knz commented Sep 16, 2022

@beikov you can influence this by either:

  • forcing crdb to treat 'integer' as int4, using the session variable default_int_size
  • or create your column as int4[] explicitly (instead of integer[]).

@beikov
Copy link

beikov commented Sep 19, 2022

Thanks for the hint. I'll adapt the CockroachDialect in Hibernate then as part of https://hibernate.atlassian.net/browse/HHH-15528

@beikov
Copy link

beikov commented Sep 22, 2022

The next issue I have run into is that serial4 produces a value that is too big to fit into a 4 byte integer. With a table:

    create table TriggerEntity (
       id serial4 not null,
        name varchar(255),
        primary key (id)
    )

extracting the generated value with ResultSet#getInt results in:

Caused by: org.postgresql.util.PSQLException: Bad value for type int : 798864890501038081
	at app//org.postgresql.jdbc.PgResultSet.toInt(PgResultSet.java:3218)
	at app//org.postgresql.jdbc.PgResultSet.getInt(PgResultSet.java:2408)
	at app//org.postgresql.jdbc.PgResultSet.getInt(PgResultSet.java:2836)

Pseudo-code:

var stmt = connection.prepareStatement( "insert into TriggerEntity(name) values ('abc')", PreparedStatement.RETURN_GENERATED_KEYS );
stmt.executeUpdate();
var rs = stmt.getGeneratedKeys();
rs.next();
var id = rs.getInt( 1 );

@knz
Copy link
Contributor Author

knz commented Sep 22, 2022

If you run the create table in an interactive session, you'd see what CockroachDB thinks about serial4:

NOTICE: upgrading the column x to INT8 to utilize the session serial_normalization setting
HINT: change the serial_normalization to sql_sequence or sql_sequence_cached if you wish to use 
a smaller sized serial column at the cost of performance. 
See https://www.cockroachlabs.com/docs/v22.1/serial.html

If you want serial4 to behave exactly like in postgres, at the expense of performance, there's some options. I encourage you to peruse the linked documentation page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-datatypes SQL column types usable in table descriptors. A-sql-pgcompat Semantic compatibility with PostgreSQL C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) X-anchored-telemetry The issue number is anchored by telemetry references.
Projects
None yet
Development

No branches or pull requests