-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: CockroachDB's extended literal formats are not compatible with postgres #26128
Comments
Sigh. Yeah, I guess we'll need to change this (over a couple of release cycles). |
No objection from me, although I'm interested in the migration plan. |
@nvanbenschoten basically add a new syntax and a big flashy warning with the old one in 2.1; in 2.2 make the old syntax opt-in via a compat flag, and in 2.3 remove. |
I think it might be nice to go through a cycle in which the
This ensures that even if users don't notice the warning they'll get a clear breakage instead of a change in behavior. We could make the new syntax available on an opt-in basis earlier than 2.3 if it's high enough priority. |
I was more thinking of this:
|
Isn't |
A warning can be ignored. Changing the thing to become opt-in is intrusive. I think we want a version in-between the current behavior and the one where the change is intrusive to warn users. Unless you think we can skip that step? |
It seems to me that opting in (through a cluster setting, I presume) is roughly equivalent to acknowledging the warning, so it really isn't much more intrusive.
I'll defer to Ben's judgment on that. |
Ok we have a way forward which I am likely to explore for 2.1:
So we can distinguish on the case of the "b" and properly fix #20991 and this one in one fell swoop. |
Basically what I am proposing is to distinguish on the case of |
I have tried it and it works pretty well! 🎉 see #28807. |
28807: sql: fix (really: add) the handling of bit arrays r=knz a=knz Fixes #20991. Informs #26128. As described in #28814 there was a mismatch between the original implementation of BIT in CockroachDB, which was inspired by MySQL, and that required by the PostgreSQL dialect. That prior PR #28814 thus disabled the functionality pending a replacement. This patch provides this replacement. PostgreSQL bit arrays are arbitrary long strings of bits, possibly longer (or shorter) than 64 bits, with odd number of bits. They also support concatenation, which integers do not. They are also different from strings, because they support bitwise logic operators, which strings do not. Their literal values are also different from both integer and string values, noted with `B'....'` (note the capital B; and see discussion below). This patch provides the correct handling of bit arrays in CockroachDB. It does this as follows: - a new literal syntax `B'....'` for bit array literals. - a new coltype TBitArray. - a new datum type DBitArray. - new operator overloads specialized for bit arrays. - a new ColumnType BIT (see discussion below). - a new KV encoding type BitArray. Complete with separate value and key encodings. The new bit array type is conditional on a cluster version, so that lower version nodes in a cluster do not get confused by the new key encoding. Regarding the new ColumnType BIT: this now becomes a first class SemanticType instead of a VisibleType on INT. As discussed in #28814 any column previously created with an INT semantic type and BIT in the VisibleType is untouched. With regards to sorting, bit arrays sort lexicographically from the leftmost bit. In memory the bits are stored in words of 64 bits, with the leftmost bit in the MSB of the word, so as to enable efficient comparisons. To ensure the ordering is preserved in the key encoding we encode as follows: - each word in turn using varint encoding, then - a word list terminator, which is guaranteed to not occur in the words before that, then - a varint that indicates the number of used bits in the last word. To decode, we must first scan through the word list until we find the terminator, to determine the number of words; then we scan again to decode the words in memory, then we use the last varint together with the number of decoded words to compute the final bit array size. Release note (bug fix): CockroachDB now supports the BIT and VARBIT (BIT VARYING) data types like PostgreSQL: this is a bit array. See the PostgreSQL documentation for details. Only the bit array literal notation with a capital B (e.g. `B'10001'`) is currently supported; the syntax with a small b (e.g. `b'abcd'`) continues to denote *byte* arrays as in previous versions of CockroachDB. Release note (backward-incompatible change): CockroachDB happened to support the notation `B'abcde'` previously to express byte array literals, although this was not documented. This is not supported any more; the notation `B'100011'` will now express *bit* array literals like in PostgreSQL. The notation `b'...'` remains for byte array literals. b273321 Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
So here's the current status:
My opinions:
|
We have marked this issue as stale because it has been inactive for |
format
x'aaaaaaa'
format
b'aaaaaaa'
In PostgreSQL, byte array literals use a different syntax, either:
e'ab\nc'::BYTEA
(escaped string, the same is achieved currently withb'...'
in crdb)'\xAAAA'::BYTEA
(i.e. string starting with\x
; the same is achieved currently withx'...'
in crdb)@nvanbenschoten @bdarnell what do you think? We need to support the BIT type properly ultimately. I am tempted to set up a plan to deprecate the current literal format in 2.1 so that it becomes available for BIT in 2.2. Thoughts?
Jira issue: CRDB-5687
The text was updated successfully, but these errors were encountered: