importccl: Allow users to IMPORT into an existing table #26834

nstewart · 2018-06-19T17:50:03Z

Currently IMPORT can only create a new table into which it imports data.

However many users would like to IMPORT into an existing table. In some cases this is because they want to configure a tables replication settings before importing the data. In others they want to do the import in stages, for example because they simply cannot fit all the data to import in a single source location.

Other users want to IMPORT new data at regular intervals (e.g. nightly dumps from some other system) though such cases raise additional questions around if and how the import can modify existing data.

The two main differences between importing a new table and into an existing table are that an existing table can have a) traffic interacting with it and b) existing data.

The one of biggest reasons IMPORT is so much faster than bulk-INSERTs is that it skips the majority of the transactional writing infrastructure, however this makes point a) tricky: we cannot ensure transactional safety of the normal sql traffic -- for which transaction safety is implied -- while bulk ingestion is happening to the same data. The easiest way to reduce importing into an existing table to the current import w.r.t. a) is to simply remove the traffic -- schema change the table to an offline state, then import, then bring it back online.

Another reason bulk-ingestion is faster is that it prepares entire SSTs that go directly into RocksDB, replacing any values they happen to contain. In an empty, just-created table, that replacement is not a concern, but for a table that contains data, it is. Collision detection is possible, but not cheap -- it'd require scanning the existing and importing data, either prior to or after ingestion, as well as some sort of resolution. For the primary key, one option is to just let the IMPORT replace the existing value, but detecting and handling conflicts in unique secondary indexes is a problem the implementation will need to address.

An initial prototype of this take-table-off-line-then-import approach was added in #37451.

Still TODO for the initial version to be considered complete:

The text was updated successfully, but these errors were encountered:

38449: libroach: Adds logic to run AddSSTable with shadowing disabled. r=adityamaru27 a=adityamaru27 This change enhances AddSSTable to be run with the option of disallowing ingestion of keys which would shadow already existing data. Used to safeguard IMPORT INTO which would otherwise break secondary indexes. (#38044) (Part of the larger roadmap #26834) TODO: Add IMPORT INTO test(s) which trigger the safeguard. Co-authored-by: Aditya Maru <adityamaru@cockroachlabs.com>

39023: pkg: Add support for user specified target cols in IMPORT INTO r=adityamaru27 a=adityamaru27 This change adds the ability for users to specify a subset of the columns of an existing table to import into, from a CSV data source. (Part of the larger roadmap #26834) 39043: distsqlrun: lookup join efficiency improvements r=jordanlewis a=jordanlewis - distsqlrun: don't save lookup rows twice - distsqlrun: stream lookup join output - distsql: remove unused field from lookup join spec Previously, lookup join buffered its rendered output rows before emitting any of them, because this used to be a requirement when lookup join was also responsible for doing a second layer of lookups against an index. This was no longer necessary. Now, after the result of a lookup batch is accumulated into memory, rows are rendered and emitted in a row-at-a-time streaming fashion. Also, remove the unused extra index filter expression field from the lookup join spec. 39118: opt: add IndexOrdinal alias r=RaduBerinde a=RaduBerinde Adding a `cat.IndexOrdinal` alias to make it more explicit when a value is an index ordinal. It is only an alias because a separate type would make things like loops more annoying. Release note: None 39146: sql/tree: store From as value in SelectClause AST node r=nvanbenschoten a=nvanbenschoten Drive by cleanup. The value is assumed to be non-nil in a number of places, so we should avoid the unnecessary indirection. Release note: None Co-authored-by: Aditya Maru <adityamaru@cockroachlabs.com> Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

nstewart added the A-disaster-recovery label Jun 19, 2018

knz added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 21, 2018

knz changed the title ~~IMPORT: Allow users to IMPORT into an existing table~~ importccl: Allow users to IMPORT into an existing table Jul 23, 2018

dt self-assigned this May 21, 2019

dt added this to the 19.2 milestone May 21, 2019

dt mentioned this issue May 21, 2019

import: import csv into existing table #19432

Closed

dt mentioned this issue Jun 27, 2019

importccl: rollback IMPORT INTO on failure #38518

Closed

tbg mentioned this issue Jul 9, 2019

storage: add RevertRange #38477

Merged

This was referenced Jul 22, 2019

libroach: Adds logic to run AddSSTable with shadowing disabled. #38449

Merged

pkg: Add support for user specified target cols in IMPORT INTO #39023

Merged

importccl: Adds basic tests for IMPORT INTO from CSV #38962

Merged

dt closed this as completed Aug 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

importccl: Allow users to IMPORT into an existing table #26834

importccl: Allow users to IMPORT into an existing table #26834

nstewart commented Jun 19, 2018 •

edited by dt

Loading

importccl: Allow users to IMPORT into an existing table #26834

importccl: Allow users to IMPORT into an existing table #26834

Comments

nstewart commented Jun 19, 2018 • edited by dt Loading

nstewart commented Jun 19, 2018 •

edited by dt

Loading