Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: COPY CSV doesn't support hex encoding for bytea #69640

Closed
steven-sheehy opened this issue Aug 31, 2021 · 2 comments · Fixed by #81120
Closed

sql: COPY CSV doesn't support hex encoding for bytea #69640

steven-sheehy opened this issue Aug 31, 2021 · 2 comments · Fixed by #81120
Labels
A-sql-pgwire pgwire protocol issues. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) X-blathers-triaged blathers was able to find an owner

Comments

@steven-sheehy
Copy link

steven-sheehy commented Aug 31, 2021

Describe the problem

Following up on #68804, CockroachDB does not support hex encoded byte array data when attempting to COPY FROM STDIN WITH CSV. As part of #68804, octal/escape encoded byte arrays were fixed which provides a valid workaround, but escape encoding is incredibly inefficient in the amount of data sent over the wire. This can greatly reduce ingest performance if the frequency or size of the bytea columns are large.

There was some discussion about how hex encoding isn't supported due to the SQL layer not supporting the PostgreSQL byte array literals syntax due to #26128. I would like to point out that IMPORT INTO CSV DATA already does support the \x hex encoded format for CSVs. Also, since COPY doesn't support the x'abc' or the \xabc syntax it doesn't have the difficult problem having to support both or not breaking compatibility like the SQL layer.

The fact that other areas of the system already support \x syntax suggests that COPY could be also be enhanced to do the necessary translation from \x to the format that the SQL layer supports.

To Reproduce

cat t.csv
5,"\x6869"
6,\x6869
> create table t(a int primary key, b bytea);
> set bytea_output = 'escape';
> COPY t FROM STDIN WITH CSV;
> 1,X'6869'
> 2,x'6869'
> 3,"\x6869"
> 4,\x6869
> \.
> IMPORT INTO t (a, b) CSV DATA ('http://localhost:3000/t.csv');
> select * from t where b = decode(encode('hi', 'hex'), 'hex');
  a | b
----+-----
  5 | hi
  6 | hi
(2 rows)
> select * from t;
 a |     b
---+------------
 1 | X'6869'
 2 | x'6869'
 3 | \\x6869
 4 | \\x6869
 5 | hi
 6 | hi
(6 rows)

Only the rows that came from IMPORT are inserted with the correct bytea data.

Expected behavior

Hex encoded bytea data supported via COPY CSV.

Additional data / screenshots

If applicable, add screenshots to help explain your problem.

Environment:

  • CockroachDB version 21.1.8
  • Server OS: MacOS
  • Client app: psql

Additional context

Unable to use existing dumps that have hex encoding. Reduced ingest performance due to less efficient octal encoding.

Jira issue: CRDB-9700

@steven-sheehy steven-sheehy added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Aug 31, 2021
@blathers-crl
Copy link

blathers-crl bot commented Aug 31, 2021

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added A-disaster-recovery O-community Originated from the community X-blathers-triaged blathers was able to find an owner T-disaster-recovery labels Aug 31, 2021
@rafiss
Copy link
Collaborator

rafiss commented Aug 31, 2021

Thanks for typing up this issue @steven-sheehy! I'll post back here once we decide how to handle this

@blathers-crl blathers-crl bot added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Aug 31, 2021
@rafiss rafiss added A-sql-pgwire pgwire protocol issues. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-disaster-recovery T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) A-disaster-recovery labels Aug 31, 2021
@ajwerner ajwerner changed the title COPY CSV doesn't support hex encoding for bytea sql: COPY CSV doesn't support hex encoding for bytea Aug 31, 2021
@craig craig bot closed this as completed in 89dd217 May 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgwire pgwire protocol issues. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants