Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result from DuckDB execution #556

Open
2 tasks done
dpxcc opened this issue Jan 25, 2025 · 8 comments · May be fixed by #616
Open
2 tasks done

Incorrect result from DuckDB execution #556

dpxcc opened this issue Jan 25, 2025 · 8 comments · May be fixed by #616
Labels
bug Something isn't working incorrect result Bugs that return incorrect data
Milestone

Comments

@dpxcc
Copy link
Contributor

dpxcc commented Jan 25, 2025

What happens?

Turning on duckdb.force_execution produces incorrect result
Please see the repro below

To Reproduce

postgres=# CREATE TABLE s (a text[]);
CREATE TABLE
postgres=# INSERT INTO s VALUES (ARRAY['abc', 'def', 'ghi']);
INSERT 0 1
postgres=# CREATE TABLE t AS TABLE s;
SELECT 1
postgres=# SELECT * FROM t;
       a       
---------------
 {abc,def,ghi}
(1 row)

postgres=# SET duckdb.force_execution TO true;
SET
postgres=# SELECT * FROM t;
  a   
------
 \x01
(1 row)

OS:

Linux

pg_duckdb Version (if built from source use commit hash):

0.2.0

Postgres Version (if built from source use commit hash):

17.2

Hardware:

No response

Full Name:

Cheng Chen

Affiliation:

Mooncake Labs

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Not applicable - the reproduction does not require a data set

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

  • Yes, I have
@JelteF
Copy link
Collaborator

JelteF commented Jan 28, 2025

Hmm. Strange... Somehow that CREATE TABLE AS is significant, since this incorrect output does not occur when querying the s table.

Copy pastable reproduction script:

CREATE TABLE s (a text[]);
INSERT INTO s VALUES (ARRAY['abc', 'def', 'ghi']);
CREATE TABLE t AS TABLE s;
SELECT * FROM s;
SELECT * FROM t;
SET duckdb.force_execution TO true;
SELECT * FROM s;
SELECT * FROM t;

@JelteF JelteF added bug Something isn't working incorrect result Bugs that return incorrect data labels Jan 28, 2025
@YuweiXiao
Copy link

YuweiXiao commented Feb 8, 2025

not sure if it is related to toast. for varchar, ConvertPostgresToDuckValue uses datum directly instead of wrapper such asDatumGetTextP

btw, detoast will palloc memory if needed. so there is memory leak for decimal in ConvertPostgresToDuckValue, as DatumGetNumeric do detoast internally.

@kysshsy
Copy link

kysshsy commented Feb 9, 2025

Hi. While investigating the problem, I observed two inconsistent behaviors:

  1. Column Type Discrepancy:
    table t.a has the different column type from table s.a
postgres=# \d+ s
                                            Table "public.s"
 Column |  Type  | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+--------+-----------+----------+---------+----------+-------------+--------------+-------------
 a      | text[] |           |          |         | extended |             |              |
Access method: heap

postgres=# \d+ t
                                                  Table "public.t"
 Column |        Type         | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+---------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 a      | character varying[] |           |          |         | extended |             |              |
Access method: heap
  1. Type Conversion Anomaly:
    During the execution of CREATE TABLE t AS TABLE s;, the function pgduckdb::GetPostgresDuckDBType(column); returns an OID of 1015 (which corresponds to VARCHAR_ARRAY), so Postgres interprets the column type in table t as character varying[]. However, when executing SELECT * FROM t;, the system returns an OID of 1043 (VARCHAR), causing the query results to be incorrect.

So I suspect the issue lies in the definition of the corresponding table in DuckDB.( I can't figure out how the corresponding table is created.)

I haven't quite understood how the only Postgres table mechanism works. Is there a corresponding table in DuckDB?

JelteF added a commit that referenced this issue Feb 11, 2025
In DuckDB the canonical name for an unlimited size text column is
`VARCHAR`[1], in Postgres this is `TEXT`[2]. In DuckDB `TEXT` is simply
an alias for `VARCHAR` type, and there's no way to know what was
provided by the user. In Postgres these types are actually distinct,
although behave exactly the same for unlimited length. Basically
everyone uses `TEXT` instead of `VARCHAR`.

Currently we convert the DuckDB type to a Postgres `VARCHAR`. In many
cases this doesn't really matter, because pretty much all clients handle
VARCHAR and TEXT the same too. There's one place where this leaks
through though: DDL coming from a query. For example if you do a CTAS
with a DuckDB query the resulting table columns will be of type
`character varying` instead of `text`[3].

[1]: https://duckdb.org/docs/sql/data_types/text.html
[2]: https://www.postgresql.org/docs/current/datatype-character.html
[3]: #556 (comment)
@JelteF
Copy link
Collaborator

JelteF commented Feb 11, 2025

@kysshsy The issue you're describing is a separate issue from the wrong result one. It happens when you run the CREATE TABLE AS while duckdb.force_execution is true. If you don't, like I did in the steps in my previous comment, both will be the text[] type. I do think it's confusing behaviour so I created a draft PR to fix this issue (which needs some more work): #583

@dpxcc I tested if this was a regression, but this is also happening on 0.2.0. So while it's a serious issue, I don't think it needs to block the 0.3.0 release.

@JelteF JelteF added this to the 0.4.0 milestone Feb 11, 2025
@YuweiXiao
Copy link

The attribute's attndims is 0 for table created by CTAS, though it is VARCHARARRAYOID. it is weird and should be a PG behavior. During scan, pg_duckdb constructs varchar instead of list<varchar>, leading to incorrect result.

Image

Image

@JelteF
Copy link
Collaborator

JelteF commented Feb 19, 2025

@YuweiXiao thanks a lot for that investigation!

The attribute's attndims is 0 for table created by CTAS, though it is VARCHARARRAYOID.

As explained in my comment above, the fact that it is VARCHARARRAYOID is tracked in #583. (and is not relevant to the incorrect result).

The attndims=0 situation is new to me though. And indeed pg_duckdbs understanding of attndims=0 seems to be the reason for the incorrect result bug.

To be clear, that attndims=0 occurs for this CTAS seems to be fully expected. Using a Postgres without pg_duckdb also results in attndims=0 for that column. So the problem here is that pg_duckdb interprets that incorrectly to mean that it is the basetype. Without looking closer at what attndims=0 actually means, I'm not 100% sure how we should interpret it instead. But it's clear that pg_duckdb its current interpretation is plain wrong.

@YuweiXiao
Copy link

The ndims property is simply ignored during the CTAS. https://github.com/postgres/postgres/blob/80d7f990496b1c7be61d9a00a2635b7d96b96197/src/backend/commands/createas.c#L495

I suggest using pg_type's typcategory to check if the type is an array. The ndim will be handled when we parsing the tuple, as the header stores the correct ndims value.

@JelteF
Copy link
Collaborator

JelteF commented Feb 19, 2025

Thanks again. I'm working on a fix/workaround for this problem now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working incorrect result Bugs that return incorrect data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants