sql: allow ARRAYs in forward indexes #17154

justinj · 2017-07-20T21:16:28Z

Arrays support inverted indexes, but not forward indexes. If you want to index arrays, starting in CockroachDB 20.1 consider using an inverted index on your column.

This issue tracks adding forward indexes to array types.

Jira issue: CRDB-6049

bra-fsn · 2017-08-18T12:12:26Z

You should definitely implement it. :)

bra-fsn · 2018-03-13T09:26:31Z

Any chance that this will be implemented?

justinj · 2018-03-13T14:38:42Z

It's not currently planned for a release, it's on my to-do list to look into when I get some free time - can't promise anything at the moment though.

Out of curiosity, do you have any details on what you would use this for?

bra-fsn · 2018-03-13T16:29:48Z

Sure. Currently I have two use cases:

tagging messages with labels (like important, later etc) and searching for messages which have a label or labels. Labels here are mostly strings or enums (a limited set which can grow in time).
tagging objects with location IDs. There can be a limited (but variable in the range of 1 to tens) number of locations for each object. Some have only one, some more. The location IDs here are numbers (integers). I want to search for objects which live on a given location ID. There can be millions-billions of them, so I want to page on the result set (of course maintaining consistency, so between the start and the end of the search, the result set should be consistent, fixed)

ckome · 2018-05-06T18:36:32Z

You should definitely implement it. :)

mzimmerman · 2019-02-10T02:06:59Z

My use case for this is storing all email addresses associated with each email I load into the DB. I want to be able to search for all messages associated with one email address.

jordanlewis · 2020-04-01T13:32:15Z

Arrays are now indexable via inverted indexes, which supports containment operations on the arrays. We still don't support ordinary forward indexes on arrays.

…e types Fixes cockroachdb#17154. Fixes cockroachdb#35707. This PR enables arrays to be ordered and indexed by introducing an ordered key encoding for arrays. Once this exists, the rest of the SQL infrastructure is ready to handle indexing and ordering on arrays. To encode an array of elements `ARRAY[a, b]`, we create the following encoding. Let `AM` = a marker byte for arrays, let `EM` be an "array element" marker, and let `AT` be a terminator byte. `enc(ARRAY[a, b]) = [AM, EM, enc(a), EM, enc(b), AT]` The key is that the terminator is less than the element marker. This allows for the "prefix matching" style comparison that arrays support. Release note (sql change): This PR adds support for indexing and ordering of arrays of indexable and orderable inner types.

rohany · 2020-04-28T14:42:29Z

Yup, I've marked that PR as closing this (#48045)

…e types Fixes cockroachdb#17154. Fixes cockroachdb#35707. This PR enables arrays to be ordered and indexed by introducing an ordered key encoding for arrays. Once this exists, the rest of the SQL infrastructure is ready to handle indexing and ordering on arrays. To encode an array of elements `ARRAY[a, b]`, we create the following encoding. Let `AM` = a marker byte for arrays and let `AT` be a terminator byte. `enc(ARRAY[a, b]) = [AM, enc(a), enc(b), AT]` The key is that the terminator is less than the element marker. This allows for the "prefix matching" style comparison that arrays support. Release note (sql change): This PR adds support for indexing and ordering of arrays of indexable and orderable inner types.

48045: sql: enable indexing and ordering on arrays of orderable and indexable types r=rohany a=rohany Fixes #17154. Fixes #35707. This PR enables arrays to be ordered and indexed by introducing an ordered key encoding for arrays. Once this exists, the rest of the SQL infrastructure is ready to handle indexing and ordering on arrays. To encode an array of elements `ARRAY[a, b]`, we create the following encoding. Let `AM` = a marker byte for arrays and let `AT` be a terminator byte. `enc(ARRAY[a, b]) = [AM, enc(a), enc(b), AT]` The key is that the terminator is less than the element marker. This allows for the "prefix matching" style comparison that arrays support. Release note (sql change): This PR adds support for indexing and ordering of arrays of indexable and orderable inner types. 48608: schemachange: speed up slow schema changes r=spaskob a=spaskob Touches #45150. Fixes #47607. Touches #47790. Release note (performance improvement): Before this a simple schema change could take 30s+. The reason was that if the schema change is not first in line in the table mutation queue it would return a re-triable error and the jobs framework will re-adopt and run it later. The problem is that the job adoption loop is 30s. To repro run this for some time: ``` cockroach sql --insecure --watch 1s -e 'drop table if exists users cascade; create table users (id uuid not null, name varchar(255) not null, email varchar(255) not null, password varchar(255) not null, remember_token varchar(100) null, created_at timestamp(0) without time zone null, updated_at timestamp(0) without time zone null, deleted_at timestamp(0) without time zone null); alter table users add primary key (id); alter table users add constraint users_email_unique unique (email);' ``` Instead of returning on re-triable errors we retry with exponential backoff in the schema change code. This pattern of dealing with re-triable errors in client job code is encouraged vs relying on the registry because the latter leads to slowness and additionally to more complicated test fixtures that rely on hacking with the internals of the job registry, Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu> Co-authored-by: Spas Bojanov <pachob@gmail.com>

50662: sql: disable arrays in forward indexes r=mjibson a=mjibson There are some open questions about inverted index types being used in forward indexes. Disable those for now. Fixes #50656 See #50659 See #17154 Release note (sql change): disable arrays in non-inverted indexes. 50742: sql: granular detection of non-immutable constant casts r=RaduBerinde a=RaduBerinde #### opt: improve tests for parsing timestamps Add more `sem/tree` unit tests for parsing time-related types. In particular: - relative timestamps ('now', 'tomorrow') - TZ variants - timestamps that include timezones for DDate, DTime - cases where the result depends on the current time and/or timezone. In a future change, the parsing functions will also return a flag indicating if the result depends on the context or not; these tests will be updated to check that flag as well. Release note: None #### pgdate: add WithoutTimezone variants The way we parse DTimestamp (and other non-TZ variants) is convoluted: we initially parse it just like a DTimestampTZ, and then we strip away the timezone (adjusting it so the timestamp looks the same as before, except with UTC location). This means that we incorporate the local timezone, even though it is inconsequential. This change adds WithoutTimezone API variants to `pgdate` which avoids using the local timezone. This will allow extending the parsing APIs to indicate whether the result depends on the current time or timezone. Release note: None #### sql: use dummy ParseTimeContext during type-checking We are not supposed to use any context-dependent parsing results during type-checking. As such, use a dummy ParseTimeContext instead of having SemaContext implement the interface. Release note: None #### sql: granular detection of non-immutable constant casts Interpreting a timestamp (and similar types) can depend on the current timezone or even the current time. A recent change prevented these casts from happening during type-checking. Unfortunately, this includes a lot of common cases which aren't actually context-dependent. The most important are dates and timestamps without time zone (except rare cases like 'now' or 'tomorrow'). In these cases, the type-checked expressions won't seem constant and won't be able to be used for partitioning, index predicates, etc. This change improves the parsing functions to return a `dependsOnContext` boolean. When this flag is false, the result is immutable and is replaced during type-checking. The volatility of the version of date_trunc that returns a TIMESTAMPTZ is corrected to Stable. Release note: None Co-authored-by: Matt Jibson <matt.jibson@gmail.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>

ajwerner · 2021-07-29T21:31:39Z

If and when we allow this we need to be careful of enum values in partitioning and how it interacts with dropping enum values.

saurik · 2022-01-05T05:37:35Z

Out of curiosity, do you have any details on what you would use this for?

@justinj FWIW, my use case is for storing tree-like data using a path encoding in the primary key (which is the correct way of doing it for the kinds of queries I am going to do). I can (and will) get by using a string-like type--and I haven't yet checked what operates you support for arrays and so if you wouldn't support an indexed "prefix of array matches [0,3,1]" (which I presume you would have to support for strings) it wouldn't really matter anyway--but it feels more "natural" to do this with arrays.

ajwerner · 2022-08-23T22:08:25Z

This is the cause of prisma/prisma#12282

jordanlewis · 2022-11-11T19:01:30Z

If and when we allow this we need to be careful of enum values in partitioning and how it interacts with dropping enum values.

Can you elaborate @ajwerner ?

ajwerner · 2022-11-11T19:07:53Z

When we drop enum values we go and see if they are used anywhere. There's code to do this for arrays inside rows. There's also code which searches for enum values in partitioning descriptors. There will need to be code added to search for the values inside of enums used as partitioning values, assuming we support that.

Consider a contrived transformation of this test.

CREATE TYPE places AS ENUM ('us', 'eu', 'ap');
CREATE TABLE partitioned_table (
    place places[], id INT8,
    PRIMARY KEY (place, id)
)
    PARTITION BY LIST (place)
        (
            PARTITION us VALUES IN (ARRAY['us']),
            PARTITION eu VALUES IN (ARRAY['eu'])
        );
ALTER TYPE places DROP VALUE 'us';

The above should fail because us is used in a partitioning array. I don't think the logic here would find it.

jordanlewis · 2022-11-11T19:09:47Z

Indeed...

F221111 19:09:22.802901 2775 sql/opt_catalog.go:1456  [-] 314  error while decoding partition tuple: tabledesc.immutable: {ID: 106, Version: 1, ModificationTime: "1668193750.882603364,0", ParentID: 100, ParentSchemaID: 101, State: PUBLIC, NextColumnID: 3, Columns: [{ID: 1, TypeID: 100105, Null: false}, {ID: 2, TypeID: 20, Null: false}], NextFamilyID: 1, Families: [{ID: 0, Columns: [1, 2]}], PrimaryIndex: 1, NextIndexID: 2, Indexes: [{ID: 1, Unique: true, KeyColumns: [{ID: 1, Dir: ASC}, {ID: 2, Dir: ASC}]}]} []                                                                                           F221111 19:09:22.802901 2775 sql/opt_catalog.go:1456  [-] 314 !goroutine 2775 [running]:

justinj added this to the Later milestone Jul 20, 2017

justinj mentioned this issue Jul 26, 2017

sql: support a limited form of arrays #16764

Merged

dianasaur323 added the O-community Originated from the community label Aug 18, 2017

knz added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 27, 2018

knz added A-bulkio-schema-changes A-sql-encoding Relating to the SQL/KV encoding. and removed A-bulkio-schema-changes labels Apr 27, 2018

awoods187 mentioned this issue Aug 6, 2018

SQL: Add arrays in indexes as an unimplemented error in SQL.y #28300

Closed

petermattis removed this from the Later milestone Oct 5, 2018

knz added the X-anchored-telemetry The issue number is anchored by telemetry references. label Mar 14, 2019

knz mentioned this issue Mar 14, 2019

sql: support more index-type combinations #35730

Open

bra-fsn mentioned this issue Mar 11, 2020

sql: add inverted indexes on arrays #45157

Merged

jordanlewis changed the title ~~sql: allow ARRAYs in indexes~~ sql: allow ARRAYs in forward indexes Apr 1, 2020

rohany mentioned this issue Apr 26, 2020

sql: enable indexing and ordering on arrays of orderable and indexable types #48045

Merged

craig bot closed this as completed in 60cb535 May 9, 2020

maddyblue mentioned this issue Jun 25, 2020

sql: disable arrays in forward indexes #50662

Merged

maddyblue reopened this Jun 25, 2020

ajwerner mentioned this issue Aug 30, 2021

sql: Allow for the creation of an index on a json element #30690

Closed

jordanlewis mentioned this issue Nov 11, 2022

sql: permit forward indexes on arrays #91762

Merged

craig bot closed this as completed in b0daa5f Nov 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: allow ARRAYs in forward indexes #17154

sql: allow ARRAYs in forward indexes #17154

justinj commented Jul 20, 2017 •

edited by cockroach-jira-scripts

Loading

bra-fsn commented Aug 18, 2017 •

edited

Loading

bra-fsn commented Mar 13, 2018

justinj commented Mar 13, 2018

bra-fsn commented Mar 13, 2018

ckome commented May 6, 2018

mzimmerman commented Feb 10, 2019

jordanlewis commented Apr 1, 2020

rohany commented Apr 28, 2020

ajwerner commented Jul 29, 2021

saurik commented Jan 5, 2022

ajwerner commented Aug 23, 2022

jordanlewis commented Nov 11, 2022

ajwerner commented Nov 11, 2022

jordanlewis commented Nov 11, 2022

sql: allow ARRAYs in forward indexes #17154

sql: allow ARRAYs in forward indexes #17154

Comments

justinj commented Jul 20, 2017 • edited by cockroach-jira-scripts Loading

bra-fsn commented Aug 18, 2017 • edited Loading

bra-fsn commented Mar 13, 2018

justinj commented Mar 13, 2018

bra-fsn commented Mar 13, 2018

ckome commented May 6, 2018

mzimmerman commented Feb 10, 2019

jordanlewis commented Apr 1, 2020

rohany commented Apr 28, 2020

ajwerner commented Jul 29, 2021

saurik commented Jan 5, 2022

ajwerner commented Aug 23, 2022

jordanlewis commented Nov 11, 2022

ajwerner commented Nov 11, 2022

jordanlewis commented Nov 11, 2022

justinj commented Jul 20, 2017 •

edited by cockroach-jira-scripts

Loading

bra-fsn commented Aug 18, 2017 •

edited

Loading