-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: allow ARRAYs in forward indexes #17154
Comments
You should definitely implement it. :) |
Any chance that this will be implemented? |
It's not currently planned for a release, it's on my to-do list to look into when I get some free time - can't promise anything at the moment though. Out of curiosity, do you have any details on what you would use this for? |
Sure. Currently I have two use cases:
|
You should definitely implement it. :) |
My use case for this is storing all email addresses associated with each email I load into the DB. I want to be able to search for all messages associated with one email address. |
Arrays are now indexable via inverted indexes, which supports containment operations on the arrays. We still don't support ordinary forward indexes on arrays. |
…e types Fixes cockroachdb#17154. Fixes cockroachdb#35707. This PR enables arrays to be ordered and indexed by introducing an ordered key encoding for arrays. Once this exists, the rest of the SQL infrastructure is ready to handle indexing and ordering on arrays. To encode an array of elements `ARRAY[a, b]`, we create the following encoding. Let `AM` = a marker byte for arrays, let `EM` be an "array element" marker, and let `AT` be a terminator byte. `enc(ARRAY[a, b]) = [AM, EM, enc(a), EM, enc(b), AT]` The key is that the terminator is less than the element marker. This allows for the "prefix matching" style comparison that arrays support. Release note (sql change): This PR adds support for indexing and ordering of arrays of indexable and orderable inner types.
Yup, I've marked that PR as closing this (#48045) |
…e types Fixes cockroachdb#17154. Fixes cockroachdb#35707. This PR enables arrays to be ordered and indexed by introducing an ordered key encoding for arrays. Once this exists, the rest of the SQL infrastructure is ready to handle indexing and ordering on arrays. To encode an array of elements `ARRAY[a, b]`, we create the following encoding. Let `AM` = a marker byte for arrays and let `AT` be a terminator byte. `enc(ARRAY[a, b]) = [AM, enc(a), enc(b), AT]` The key is that the terminator is less than the element marker. This allows for the "prefix matching" style comparison that arrays support. Release note (sql change): This PR adds support for indexing and ordering of arrays of indexable and orderable inner types.
48045: sql: enable indexing and ordering on arrays of orderable and indexable types r=rohany a=rohany Fixes #17154. Fixes #35707. This PR enables arrays to be ordered and indexed by introducing an ordered key encoding for arrays. Once this exists, the rest of the SQL infrastructure is ready to handle indexing and ordering on arrays. To encode an array of elements `ARRAY[a, b]`, we create the following encoding. Let `AM` = a marker byte for arrays and let `AT` be a terminator byte. `enc(ARRAY[a, b]) = [AM, enc(a), enc(b), AT]` The key is that the terminator is less than the element marker. This allows for the "prefix matching" style comparison that arrays support. Release note (sql change): This PR adds support for indexing and ordering of arrays of indexable and orderable inner types. 48608: schemachange: speed up slow schema changes r=spaskob a=spaskob Touches #45150. Fixes #47607. Touches #47790. Release note (performance improvement): Before this a simple schema change could take 30s+. The reason was that if the schema change is not first in line in the table mutation queue it would return a re-triable error and the jobs framework will re-adopt and run it later. The problem is that the job adoption loop is 30s. To repro run this for some time: ``` cockroach sql --insecure --watch 1s -e 'drop table if exists users cascade; create table users (id uuid not null, name varchar(255) not null, email varchar(255) not null, password varchar(255) not null, remember_token varchar(100) null, created_at timestamp(0) without time zone null, updated_at timestamp(0) without time zone null, deleted_at timestamp(0) without time zone null); alter table users add primary key (id); alter table users add constraint users_email_unique unique (email);' ``` Instead of returning on re-triable errors we retry with exponential backoff in the schema change code. This pattern of dealing with re-triable errors in client job code is encouraged vs relying on the registry because the latter leads to slowness and additionally to more complicated test fixtures that rely on hacking with the internals of the job registry, Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu> Co-authored-by: Spas Bojanov <pachob@gmail.com>
50662: sql: disable arrays in forward indexes r=mjibson a=mjibson There are some open questions about inverted index types being used in forward indexes. Disable those for now. Fixes #50656 See #50659 See #17154 Release note (sql change): disable arrays in non-inverted indexes. 50742: sql: granular detection of non-immutable constant casts r=RaduBerinde a=RaduBerinde #### opt: improve tests for parsing timestamps Add more `sem/tree` unit tests for parsing time-related types. In particular: - relative timestamps ('now', 'tomorrow') - TZ variants - timestamps that include timezones for DDate, DTime - cases where the result depends on the current time and/or timezone. In a future change, the parsing functions will also return a flag indicating if the result depends on the context or not; these tests will be updated to check that flag as well. Release note: None #### pgdate: add WithoutTimezone variants The way we parse DTimestamp (and other non-TZ variants) is convoluted: we initially parse it just like a DTimestampTZ, and then we strip away the timezone (adjusting it so the timestamp looks the same as before, except with UTC location). This means that we incorporate the local timezone, even though it is inconsequential. This change adds WithoutTimezone API variants to `pgdate` which avoids using the local timezone. This will allow extending the parsing APIs to indicate whether the result depends on the current time or timezone. Release note: None #### sql: use dummy ParseTimeContext during type-checking We are not supposed to use any context-dependent parsing results during type-checking. As such, use a dummy ParseTimeContext instead of having SemaContext implement the interface. Release note: None #### sql: granular detection of non-immutable constant casts Interpreting a timestamp (and similar types) can depend on the current timezone or even the current time. A recent change prevented these casts from happening during type-checking. Unfortunately, this includes a lot of common cases which aren't actually context-dependent. The most important are dates and timestamps without time zone (except rare cases like 'now' or 'tomorrow'). In these cases, the type-checked expressions won't seem constant and won't be able to be used for partitioning, index predicates, etc. This change improves the parsing functions to return a `dependsOnContext` boolean. When this flag is false, the result is immutable and is replaced during type-checking. The volatility of the version of date_trunc that returns a TIMESTAMPTZ is corrected to Stable. Release note: None Co-authored-by: Matt Jibson <matt.jibson@gmail.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
If and when we allow this we need to be careful of enum values in partitioning and how it interacts with dropping enum values. |
@justinj FWIW, my use case is for storing tree-like data using a path encoding in the primary key (which is the correct way of doing it for the kinds of queries I am going to do). I can (and will) get by using a string-like type--and I haven't yet checked what operates you support for arrays and so if you wouldn't support an indexed "prefix of array matches [0,3,1]" (which I presume you would have to support for strings) it wouldn't really matter anyway--but it feels more "natural" to do this with arrays. |
This is the cause of prisma/prisma#12282 |
Can you elaborate @ajwerner ? |
When we drop enum values we go and see if they are used anywhere. There's code to do this for arrays inside rows. There's also code which searches for enum values in partitioning descriptors. There will need to be code added to search for the values inside of enums used as partitioning values, assuming we support that. Consider a contrived transformation of this test.
The above should fail because |
Indeed...
|
Arrays support inverted indexes, but not forward indexes. If you want to index arrays, starting in CockroachDB 20.1 consider using an inverted index on your column.
This issue tracks adding forward indexes to array types.
Jira issue: CRDB-6049
The text was updated successfully, but these errors were encountered: