Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli,sql: cockroach dump cannot be used to export all data #28948

Closed
knz opened this issue Aug 22, 2018 · 6 comments · Fixed by #56964
Closed

cli,sql: cockroach dump cannot be used to export all data #28948

knz opened this issue Aug 22, 2018 · 6 comments · Fixed by #56964
Assignees
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.

Comments

@knz
Copy link
Contributor

knz commented Aug 22, 2018

tldr: cockroach dump is a "best effort" tool which is unable to dump some types of data out of CockroachDB. Therefore, only deployments using particular combinations of SQL features can be exported via dump faithfully.

Context: cockroach dump is advertised as a migration tool, not a disaster recovery tool. Disaster recovery is offered as follows:

  • for core deployments (without license) by taking an offline backup of the data directories, by switching all nodes off during the backup
  • for enterprise deployments, using online BACKUP/RESTORE
  • or using Cockroach Cloud which handles backups as a service transparently

limitations in dump

dump is rather brittle.

There is no guarantee it can migrate data out of CockroachDB, but it can even break between versions without tests catching the breakage.

Details

cockroach dump has an "interesting" design in that it performs dumps
as follows:

  • obtains the SQL type of names via information_schema.columns;
  • independently, sends a SELECT query to the table over pgwire;
  • when the data comes back, it instanciates Datums by parsing the
    data received from pgwire; which parser to use is determined
    by a switch on the type name string obtained from
    information_schema.columns;
  • it then uses a datum formatter to write the SQL representation
    of the datums into the dump.

There are, irremediable problems with this approach:

  • the pgwire conversion of the data is lossy. That's
    irremediable. Because of this cockroach dump can never fully
    guarantee it can dump a table accurately. Really, cockroach dump should
    read the K/V pairs in binary format and decode them losslessly as datums.

  • lib/pq which is used to obtain the values over pgwire doesn't
    necessarily know about all the types supported by the CockroachDB
    node on the other side. When it does not know of a type, it will pass the pgwire
    encoded bytes as-is as a Go []byte to the cockroach dump reader,
    which won't have any logic to deal further with that. There are two
    compounded problems here:

    • cockroach dump should really talk over pgwire directly instead
      of using lib/pq.
    • cockroach dump should use the pgwire package (the
      encode/decode functions therein) to ensure it gets the same data
      that was sent. This may be insufficient though because
      we don't guarantee exact roundtrip, see the point above.
  • there is no guarantee whatsoever that information_schema.columns
    contains enough information to decide on an in-memory value type for
    CockroachDB.

    PRs sql: fix the reporting of types in information_schema.columns #28945 and cli,sql: add another crutch to the handling of column types in dump #28949 are attempting to make cockroach dump
    more resistant here by providing it a CockroachDB-specific extension
    to information_schema.columns, called crdb_sql_type. However,
    what cockroach dump should really do is go get the descriptor
    protobuf, extract the actual sqlbase.ColumnType from that, and
    then use (*ColumnType).ToDatumType().

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. A-cli labels Aug 22, 2018
knz added a commit to knz/cockroach that referenced this issue Aug 22, 2018
There are major problems in `cockroach dump` described separately in
cockroachdb#28948.

Part of this is `cockroach dump` trying to reverse-engineering a datum
type from the announced column type in `information_schema.columns`,
by parsing the type name. Prior to this patch, this parsing was
incomplete (it went out of sync with the SQL parser over time,
naturally as it was bound to do).

This entire approach is flawed, but this is no time to redesign
`cockroach dump`. Instead, this patch re-syncs the dump-specific type
parser with the general SQL parser. It also ensures this parser is
still able to parse pre-2.1 type name aliases.

Release note (bug fix): `cockroach dump` is now again better able to
operate across multiple CockroachDB versions.
@maddyblue
Copy link
Contributor

I think the approach to get KV data over GRPC is not possible because that requires having a cert instead of just having an account. That is, dump should be able to operate even if you are a non-root SQL user. I think that whatever solution we come up with, it should not rely on doing GRPC requests directly.

knz added a commit to knz/cockroach that referenced this issue Aug 23, 2018
There are major problems in `cockroach dump` described separately in
cockroachdb#28948.

Part of this is `cockroach dump` trying to reverse-engineering a datum
type from the announced column type in `information_schema.columns`,
by parsing the type name. Prior to this patch, this parsing was
incomplete (it went out of sync with the SQL parser over time,
naturally as it was bound to do).

This entire approach is flawed, but this is no time to redesign
`cockroach dump`. Instead, this patch re-syncs the dump-specific type
parser with the general SQL parser.

Release note (bug fix): `cockroach dump` is now again better able to
operate across multiple CockroachDB versions.
craig bot pushed a commit that referenced this issue Aug 23, 2018
28949:  cli,sql: add another crutch to the handling of column types in `dump` r=knz a=knz

All commits but the last part of #28945 and priors.
PR forked from #28690.

There are major problems in `cockroach dump` described separately in
#28948. Part of this is `cockroach dump` trying to reverse-engineer a datum
type from the announced column type in `information_schema.columns`,
by parsing the type name. Prior to this patch, this parsing was
incomplete (it went out of sync with the SQL parser over time,
naturally as it was bound to do).

This entire approach is flawed, but this is no time to redesign
`cockroach dump`. Instead, this patch re-syncs the dump-specific type
parser with the general SQL parser. It also ensures this parser is
still able to parse pre-2.1 type name aliases.

Release note (bug fix): `cockroach dump` is now again better able to
operate across multiple CockroachDB versions.

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
knz added a commit to knz/cockroach that referenced this issue Aug 23, 2018
There are major problems in `cockroach dump` described separately in
cockroachdb#28948.

Part of this is `cockroach dump` trying to reverse-engineering a datum
type from the announced column type in `information_schema.columns`,
by parsing the type name. Prior to this patch, this parsing was
incomplete (it went out of sync with the SQL parser over time,
naturally as it was bound to do).

This entire approach is flawed, but this is no time to redesign
`cockroach dump`. Instead, this patch re-syncs the dump-specific type
parser with the general SQL parser.

Release note (bug fix): `cockroach dump` is now again better able to
operate across multiple CockroachDB versions.
knz added a commit to knz/cockroach that referenced this issue Aug 24, 2018
There are major problems in `cockroach dump` described separately in
cockroachdb#28948.

Part of this is `cockroach dump` trying to reverse-engineering a datum
type from the announced column type in `information_schema.columns`,
by parsing the type name. Prior to this patch, this parsing was
incomplete (it went out of sync with the SQL parser over time,
naturally as it was bound to do).

This entire approach is flawed, but this is no time to redesign
`cockroach dump`. Instead, this patch re-syncs the dump-specific type
parser with the general SQL parser.

Release note (bug fix): `cockroach dump` is now again better able to
operate across multiple CockroachDB versions.
@rolandcrosby rolandcrosby self-assigned this Aug 28, 2018
@maddyblue
Copy link
Contributor

Can you list specific examples of known things that don't work? You mentioned float truncation, denormalized floats, nans, arrays in chat. I agree about arrays, but would still like specific examples on all of these.

@knz
Copy link
Contributor Author

knz commented Oct 12, 2018

The main finding is that most CockroachDB arrays are not preserved properly.

@maddyblue
Copy link
Contributor

I'm currently dealing with a bug in our array pgwire encoding that might be this. SELECT ARRAY[e'\x01'] is broken for example. See #31315. Are there other problems?

craig bot pushed a commit that referenced this issue Jul 8, 2019
37199: storage: propagate errors from contentionQueue, catch stalls in roachtest r=nvanbenschoten a=nvanbenschoten

Informs #36089.

The PR is split into a series of commits. The first fixes part of a bug that was causing #36089  to fail (thanks to #36748) and the second improves the test to have a more obvious failure condition for this class of bug in the future. The third, fifth, and sixth clean up code. Finally, the fourth fixes another bug that could cause issues with #36089.

Before the first commit, requests could get stuck repeatedly attempting to push a transaction only to repeatedly find that they themselves were already aborted. The error would not propagate up to the transaction coordinator and the request would get stuck. This commit fixes this behavior by correctly propagating errors observed by the `contentionQueue`.

The second commit bumps the TxnLivenessThreshold for clusters running `kv/contention/nodes=4` to 10 minutes. This is sufficiently large such that if at any point a transaction is abandoned then all other transactions will begin waiting for it, throughput will drop to 0 for 10 straight minutes, and the test will fail to achieve its minimum QPS requirement.

The fourth commit instructs pushers in the `txnwait.Queue` to inform all other pushers that are waiting for the same transaction when it observes an ABORTED transaction. I never saw this cause issues with #36089, but it seems very possible that it could given frequent tscache rotations.

38397: sql: deflake TestLogic//crdb_internal/max_retry_counter r=knz a=knz

Fixes #38062.

Release note: None

38654: exec: Handle NULLS in TopK sorter r=rohany a=rohany

This commit fixes NULLs in the TopK sorter by avoiding use
of the vec copy method, which has a bug. Instead, we add
a set method to the vec comparator, and use the templatized
comparator to perform the sets that the TopK sorter needs.

To facilitate this, we add an UnsetNull method to the Nulls
object. However, use of this method results in HasNull()
maybe returning true even if the vector doesn't have nulls.
This behavior already occurs when selection vectors are used.
Based on discussions with @solongordon and @asubiotto, this behavior
is OK, and future PR's will attempt to make this behavior better, and address
the bugs within the Vec Copy method.

38725: cli/dump: more clearly inform the user upon tables with no visible columns r=knz a=knz

Informs #37768.
Informs #28948.
This is coming up quite often on support, lately again on gitter and forum https://forum.cockroachlabs.com/t/error-while-dumping-core-backup/2901/3.

This PR aims to lessen the burden on support and propose a clear "next action" for the user.

Before:

```
kena@kenax ~/cockroach % ./cockroach dump --insecure defaultdb
CREATE TABLE t (,
        FAMILY "primary" (rowid)
);
Error: pq: at or near "from": syntax error
Failed running "dump"
```

After:

```
kena@kenax ~/cockroach % ./cockroach dump --insecure defaultdb
CREATE TABLE t (,
        FAMILY "primary" (rowid)
);
Error: table "defaultdb.public.t" has no visible columns
HINT: To proceed with the dump, either omit this table from the list of tables to dump, drop the table, or add some visible columns.
--
See: #37768
Failed running "dump"
```

Release note (cli change): `cockroach dump` will now more clearly
refer to issue #37768 when it encounters a table with no visible
columns, which (currently) cannot be dumped successfully.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
@rolandcrosby rolandcrosby removed their assignment Jan 16, 2020
@mwang1026
Copy link

As we're investigating functions for roundtrippable string formats, when we pick up that work we should apply those same functions here as well. cc @dt

@knz
Copy link
Contributor Author

knz commented Jan 22, 2020

I strongly recommend adopting the same type of infrastructure as postgres for this purpose.

FYI postgres defines built-in functions to convert to/from byte arrays and strings. These built-in functions are listed in the pg_catalog and actually used by some apps. Having that inventory exposed in that way will make more 3rd party apps and drivers happy.

@knz knz changed the title cli,sql: cockroach dump is broken and cannot be used to export all data cli,sql: cockroach dump cannot be used to export all data Apr 17, 2020
@craig craig bot closed this as completed in 22da098 Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants