Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres : backslash seems removed in json during COPY resulting in "invalid input syntax for type json" #202

Closed
1 task done
anayrat opened this issue Mar 27, 2024 · 9 comments · Fixed by #215
Closed
1 task done

Comments

@anayrat
Copy link

anayrat commented Mar 27, 2024

What happens?

There is a problem how DuckDB handle \.
Here is the table in Postgres :

select * from t1;
 c1 |            c2
----+---------------------------
  5 | {"key": "value \"test\""}

Here is the corresponding csv with header :

c1,c2
5,"{""key"": ""value \""test\""""}"

If I try to COPY from this CSV to Postgres, I get this error :

D COPY pg16.t1 from 't1.csv' (FORMAT csv,header );
Run Time (s): real 0.001 user 0.001272 sys 0.000000
Invalid Error: Failed to copy data: ERROR:  invalid input syntax for type json
DETAIL:  Token "test" is invalid.
CONTEXT:  JSON data, line 1: {"key": "value "test...
COPY t1, line 1, column c2: "{"key": "value "test""}"

As you can notice, the \ has been removed and the JSON is invalid.

Any idea how to solve this issue ?
Thanks

To Reproduce

Create this table in Postgres :

create table t1 (c1 int, c2 jsonb);

Try to copy this csv:

c1,c2
5,"{""key"": ""value \""test\""""}"
 COPY pg16.t1 from 't1.csv' (FORMAT csv,header );
Run Time (s): real 0.001 user 0.001272 sys 0.000000
Invalid Error: Failed to copy data: ERROR:  invalid input syntax for type json
DETAIL:  Token "test" is invalid.
CONTEXT:  JSON data, line 1: {"key": "value "test...
COPY t1, line 1, column c2: "{"key": "value "test""}"

OS:

Linux

DuckDB Version:

v0.10.1

DuckDB Client:

CLI

Full Name:

Adrien Nayrat

Affiliation:

Self Employed

Have you tried this on the latest nightly build?

I have tested with a release build (and could not test with a nightly build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
@anayrat anayrat changed the title Issue with escape in json - "invalid input syntax for type json" Issue with backslash in json - "invalid input syntax for type json" Mar 27, 2024
@anayrat
Copy link
Author

anayrat commented Mar 27, 2024

It is correctly handled by read_csv:

 select * from read_csv('t1.csv', header = true,auto_detect = true) ;
┌───────┬───────────────────────────┐
│  c1   │            c2             │
│ int64 │          varchar          │
├───────┼───────────────────────────┤
│     5 │ {"key": "value \"test\""} │
└───────┴───────────────────────────┘

@anayrat
Copy link
Author

anayrat commented Mar 27, 2024

Same issue if we try to insert in Postgres:

insert into pg16.t1 select * from read_csv('t1.csv', header = true,auto_detect = true) ;
Run Time (s): real 0.002 user 0.002002 sys 0.000000
Invalid Error: Failed to copy data: ERROR:  invalid input syntax for type json
DETAIL:  Token "test" is invalid.
CONTEXT:  JSON data, line 1: {"key": "value "test...
COPY t1, line 1, column c2: "{"key": "value "test""}"

@anayrat anayrat changed the title Issue with backslash in json - "invalid input syntax for type json" Postgres : backslash seems removed in json during COPY resulting in "invalid input syntax for type json" Mar 27, 2024
@szarnyasg
Copy link
Contributor

Hi @anayrat, did you encounter this error while using the postgres extension?

@anayrat
Copy link
Author

anayrat commented Mar 27, 2024

Yes, I forgot to mention that. I created and loaded the extension. Then attached postgres database.
In my example, "pg16" is a PostgreSQL 16 server.

@anayrat
Copy link
Author

anayrat commented Mar 27, 2024

https://www.postgresql.org/docs/current/sql-copy.html

Backslash characters () can be used in the COPY data to quote data characters that might otherwise be taken as row or column delimiters. In particular, the following characters must be preceded by a backslash if they appear as part of a column value: backslash itself, newline, carriage return, and the current delimiter character.

@szarnyasg szarnyasg transferred this issue from duckdb/duckdb Mar 27, 2024
@szarnyasg
Copy link
Contributor

Thanks! This issue may be better suited to the Postgres repository, so I transferred it there for the time being.

Mytherin added a commit to Mytherin/duckdb-postgres that referenced this issue Apr 17, 2024
Mytherin added a commit that referenced this issue Apr 17, 2024
Fix #202 - correctly escape backslashes in Postgres text copy
@LMnet
Copy link

LMnet commented May 30, 2024

I see that this bug is marked as closed, but what version of DuckDB contains the fix? With 0.10.2 I still have this issue.

@szarnyasg
Copy link
Contributor

@LMnet try upgrading the postgres client by issuing FORCE INSTALL postgres. If this doesn't fix the problem, upgrade to DuckDB v0.10.3 (released last week).

@LMnet
Copy link

LMnet commented May 30, 2024

@szarnyasg thank you, these steps helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants