Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many problems with formal text affinity values in foreign tables #66

Closed
mkgrgis opened this issue Mar 2, 2023 · 42 comments
Closed

Many problems with formal text affinity values in foreign tables #66

mkgrgis opened this issue Mar 2, 2023 · 42 comments
Labels
low priority Not important feature or issue for us. It may need a discussion but not done.

Comments

@mkgrgis
Copy link
Contributor

mkgrgis commented Mar 2, 2023

Only non - STRICT tables in legacy SQLite databases is affected.

  1. There is foreign table with bytea column. In SQLite values usually belongs to blob affinity. In SQLite BLOB column also there is some blob images of simple UTF-8 or ASCII text files, for example with such content as テスト or Hello world. This values stores with text affinity and is normal for converting to bytea. Why in this case there is error
    if (sqlite_type != col_type && col_type == SQLITE3_TEXT)
    (look like "text value for blob affinity") ?
  2. There is foreign table with bigint column. Sometimes in SQLite there is empty string values (formally text affinity) in this column which obviously means NULL. I think we can't formally analyse empty strings in not-text columns always as textaffinity. If PostgreSQL type of data in foreign column don't belongs to text/varchar family we can threat empty string as NULL. Maybe there is a reason to add a new option for "threat empty string as NULL for non text and non blob PostgreSQL columns"?
@t-kataym
Copy link
Contributor

t-kataym commented Mar 3, 2023

Thank you for reporting.

SQLite FDW expects/assumes that column type of foreign table and data affinity of SQLite are same, as you know.
It is not supported to map different types which requires implicit type cast (conversion) of data.

---Just a comment:---
To support it, we need to define the conversion specification for various patterns of combinations of types between PostgreSQL and SQLite.

Additionally, we need to consider and care the logic of pushing down of WHERE condition.
For example, SELECT * FROM tbl WHERE col IS NULL is given to PostgreSQL.
SQLite FDW wants to push down col IS NULL. If supporting "treat empty string as NULL", SQLite FDW needs to pushdown col = '' also like SELECT * FROM tbl WHERE col IS NULL or col = '' on SQLite.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Mar 3, 2023

Thanks for comments, @t-kataym ! I made a draft for SELECT only mkgrgis@db7f918 Could you point me a line to add col IS NULL or col = '' mechanism, because it's still hard for me to determine all needed lines for code extension. Maybe something near

sqlite_bind_sql_var(Oid type, int attnum, Datum value, sqlite3_stmt * stmt, bool *isnull)
?

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Mar 6, 2023

Test case for text affinity for bytea column.
SQLite table

CREATE TABLE "BLOB" (
	i INTEGER PRIMARY KEY AUTOINCREMENT,
	b BLOB
);

PostgreSQL foreign table

CREATE FOREIGN TABLE public."BLOB" (
	i int4 OPTIONS(key 'true') NULL,
	b bytea NULL
)
SERVER sqlite_srv;

Data preparing

echo -n "test" | hexdump -C
echo -n "テスト" | hexdump -C

74 65 73 74 = test
e3 83 86 e3 82 b9 e3 83 88 = テスト

SQLite INSERT

INSERT INTO "BLOB" ("b") VALUES (x'e38386e382b9e38388');
INSERT INTO "BLOB" ("b") VALUES (x'74657374');
INSERT INTO "BLOB" ("b") VALUES ('テスト');
INSERT INTO "BLOB" ("b") VALUES ('test');

SQLite, affinity check
SELECT i, b, typeof (b) FROM "BLOB";
The same data in DBeaver
изображение
During SELECT in PostgreSQL there is invalid input for type "bytea" = SQLite "blob", but affinity = "text" for value ='テスト'.
Practice described above was typical for SQLite-bash interaction for text files without \0 suitable to insert without hexdump program processing. I think we can threat this practice as common and safe for SQL2016 behaviour because there is no text in SQLite encodings we can't threat as blob.

In clean "BLOB" table I also try to test

DELETE FROM "BLOB";
INSERT INTO "BLOB" ("b") VALUES (5);
INSERT INTO "BLOB" ("b") VALUES (8935456748442424);
INSERT INTO "BLOB" ("b") VALUES (32.8);

After this ugly SQL we have successfully SELECT in PostgreSQL!

It's SQL2016 unsafe while values with int and real affinity can successfully converted to PostgreSQL bytea after something like SQLite to_string. First it's not per bit representation of this values and second SQL2016 disallowed this transformation at all because bit represetation of numeric types is implementation detail. Why there is disallowed int/real transformation to bytea but error message for relatively safe text affinity?

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Mar 7, 2023

For not ugly data for bytea there is solution draft near https://github.com/mkgrgis/sqlite_fdw/blob/f109f0facf3b85da6b57abb0b7c5b64773997f27/sqlite_query.c#L109

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Mar 9, 2023

Look like we need CAST in conditions for bytea columns of foreign tables.

Depends on affinity, data is from first example

SELECT i, b, typeof (b) FROM "BLOB" where b = 'テスト'; -- 1 row
SELECT i, b, typeof (b) FROM "BLOB" where b = x'e38386e382b9e38388'; -- 1 row

Depends on data only, data is from first example

SELECT i, b, typeof (b) FROM "BLOB" where cast (b as text) = 'テスト'; -- 2 rows
SELECT i, b, typeof (b) FROM "BLOB" where cast (b as blob) = x'e38386e382b9e38388'; -- 2 rows

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Mar 9, 2023

Now for PostgreSQL with
select * from "blob" where b = 'テスト';
there is base query
デバッグ: sqlite_fdw: finalize SELECT i, b FROM main."blob" WHERE ((b = X'e38386e382b9e38388'))
only 1 row for blob affinity only.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Mar 13, 2023

To support it, we need to define the conversion specification for various patterns of combinations of types between PostgreSQL and SQLite.

@t-kataym , what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?

@t-kataym
Copy link
Contributor

what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?

@mkgrgis Thank you for summarizing it. Please give us a time to confirm it.

Could you point me a line to add col IS NULL or col = '' mechanism, because it's still hard for me to determine all needed lines for code extension.

Please refer sqlite_deparse_null_test() which creates IS NULL expression for SQLite query.
But I think it is unnecessary to understand how to pushdown col IS NULL or col = '' mechanism because of the following reason.

The correct behavior of FDW is not to pushdown WHERE conditions if "empty_as_non_text_null" is enabled, I think.
For example, SQLite table has following record (col1 is an empty string, col2 is NULL).

col1 col2
”” NULL

If user execute SELECT col1 FROM tbl WHERE col1 = col2; on PostgreSQL, SQLite FDW should execute SELECT col1 FROM tbl for SQLite.
To generalize it, I think it is safe not to pushdown WHERE condition which has text column.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Mar 14, 2023

what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?

@mkgrgis Thank you for summarizing it. Please give us a time to confirm it.

Yes, please. There is updated table now. I understand it will hard to decide about 17×5=85 datatype×affinity combinations with only 8 obvious cases.

Please refer sqlite_deparse_null_test() which creates IS NULL expression for SQLite query.

Thanks, @t-kataym !

The correct behavior of FDW is not to pushdown WHERE conditions if "empty_as_non_text_null" is enabled, I think.

Yes, it's better for SQL:2016 behaviour on PostgreSQL side.

If user execute SELECT col1 FROM tbl WHERE col1 = col2; on PostgreSQL, SQLite FDW should execute SELECT col1 FROM tbl for SQLite.

I understand your example, @t-kataym, thanks. Really there is a problem, but other.

I thinks it's a bad example. Neither for '' nor for NULL there will no true condition predicate. Empty strings are equal '' = '', but one NULL isn't equal to other NULL. If there will be implemented full SQL:2016 behaviour on PostgreSQL side with our FDW, col1 = col2 will always NULL if one of arguments will be NULL.

@t-kataym
Copy link
Contributor

I thinks it's a bad example. Neither for '' nor for NULL there will no true condition predicate. Empty strings are equal '' = '', but one NULL isn't equal to other NULL. If there will be implemented full SQL:2016 behaviour on PostgreSQL side with our FDW, col1 = col2 will always NULL if one of arguments will be NULL.

@mkgrgis Thank you for correcting me. Just my example was not suitable.
Anyway, I believe that you understood what I wanted to say.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Mar 14, 2023

Anyway, I believe that you understood what I wanted to say.

Yes, @t-kataym . Let's study about SQL:2016 behaviour in context of prefetched SELECT col1 FROM tbl for SELECT col1 FROM tbl WHERE col1 = col2;.

First I have thought WHERE x IS NULL OR x='' is hard and unsafe. Let's test around int and uuid as first examples.

UUID
Well formed UUID in SQLite can have text or blob(only if contains 16 bytes) affinity. All BLOBs with 16 bytes of data is correct UUID, but only some texts is correct (see https://sqlite.org/src/file/ext/misc/uuid.c). Also formally mixed affinity blob/text is possible for UUID in SQLite. For UUID in SQLite pushdowned IS NULL is totally unsafe, because for SQLite IS NULL is only NULLs, but for PostgreSQL IS NULL includes all data unusable as well formed uuid value. Also WHERE uuid = {some const} may pushdowned in SQLite as search for both blob and text forms for simple, but not for STRICT tables.

Conclusion: anyway no pushdowning needed for uuid.

int
With PostgreSQL int we have something other. In PostgreSQL I have SQL CREATE TABLE Test_int" ( i serial4 NOT NULL);. Let's CREATE TABLE Test_int" ( i int NOT NULL); in SQLite.
SELECT i FROM "Test_int" where i = ''; is correct for SQLite. For PostgreSQL it's incorrect query with error, hence FDW don't need support this query!

What about joining and JOIN pushdowning for int SQLite column with some empty strings? For PostgreSQL it's impossible to join any data for int (int2, int8 etc.) column by value equal to empty string ''. In SQLite it's possible, but it's not SQL:2016 behaviour, we can only note this in documentation (README.md)
For SELECT a.*, b.* FROM a INNER JOIN b ON a.ia = b.ib in PostgreSQL ia = '' or ib = '' is impossible, but for SQLite we can have corresponding rows with NULL values after FDW. I think it's very rare case hence pusdowning is SQL:2016 safe with note in README.md and this will be usefully for data access speed.

Conclusion: we can pushdown conditions for int family columns with implementation IS NULL as IS NULL OR '' if option discussed above is true.

About other datatypes let's wait on your opinion and opinion of your department employees, @t-kataym . Its very important to decide about all combinations, because the data transformations is de facto kernel of sqlite_fdw and users expect from us SQL:2016 (with little exceptions) behaviour of data in foreign tables.

@nxhai98
Copy link
Contributor

nxhai98 commented Apr 4, 2023

To support it, we need to define the conversion specification for various patterns of combinations of types between PostgreSQL and SQLite.

@t-kataym , what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?

Hello @mkgrgis,

Sorry for late response, I've checked the table and there are some comments:

About description:

'x - error' this description is meaningless it seems a result of a manual test result. Some type transformations do not always error such as SQLITE BLOB to Timestamp as I describe on note-2 below. I think it should be 'not support' for case always error and V for another case.

1. About SQLite INT & REAL transparent:

  • SQLite INT/REAL => bytea: It should be 'V', the cast behavior: SQLite INT/REAL => Cast by sqlite API => C bytes => Postgres bytea
  • SQLite INT/REAL => uuid: It should be 'V', the cast behavior: SQLite INT/REAL => Cast by sqlite API => C string => Postgres input function => Postgres uuid

2. About SQLite BLOB transparent:

It does not correct at conversion to date, json, name, text, timestamp, timestamptz, varchar. It should be 'V' instead of 'x':

contrib_regression=# select * from c;
          c1          
----------------------
 \xe38386e382b9e38388
(1 row)

contrib_regression=# alter foreign table c alter column c1 type text;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
   c1   
--------
 テスト
(1 row)

contrib_regression=# alter foreign table c alter column c1 type json;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
ERROR:  invalid input syntax for type json
DETAIL:  Token "テスト" is invalid.
CONTEXT:  JSON data, line 1: テスト
contrib_regression=# alter foreign table c alter column c1 type date;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
ERROR:  invalid input syntax for type date: "テスト" 

As you can see, the error just because the text presentation is not match with target type. If the blob column has expected data for text type, the conversion is OK:

sqlite> .schema c
CREATE TABLE c (c1 blob);
sqlite> DELETE FROM c;
sqlite> INSERT INTO c VALUES ('2023-01-01');

contrib_regression=# alter foreign table  c alter column c1 type date;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
     c1     
------------
 2023-01-01

3. About SQLite TEXT:

In my understanding sqlite_fdw does not support cast (raise ERROR) only when execute cast from SQLITE TEXT to Postgres int2, int4, int8, bool, float4, float8, numeric and bytea. And sqlite API may return TEXT type for BLOB column.

However, the description of these types is confused, '-' for float4, float8; 'x' for int2, int4; 'V+' for numeric...

About: SQLite TEXT -> date, json, numeric, time,... (V+). The cast behavior is the same: SQLite TEXT -> C string -> Postgres input function -> Postgres data

The behavior is same as SQLite TEXT -> text(✔), varchar(✔), name (V) but the description is not same.

mkgrgis added a commit to mkgrgis/sqlite_fdw that referenced this issue Apr 4, 2023
Update data transformation table by pgspider#66
@mkgrgis
Copy link
Contributor Author

mkgrgis commented Apr 4, 2023

Hello, @nxhai98!

What about we are discussing? There are 3 aspects of transformation table

  • as is (current implementation)
  • ISO SQL:2016 (recommendations)
  • future behaviour

and also there are 2 C language data transformations

  • SELECT data (SQLite for PostgreSQL)
  • WHERE data (PostgreSQL for SQLite)

My table from previous URL is about future behaviour and mainly (but not only) about SELECT. I think You are also discussing about this.

About description:

'x - error' this description is meaningless it seems a result of a manual test result.

No, this means only C-language code of sqlite_fdw will throw here an error of improper data like

elog(ERROR, "invalid input syntax for type =%d, column type =%d", sqlite_type, col_type);

Some type transformations do not always error such as SQLITE BLOB to Timestamp as I describe on note-2 below. I think it should be 'not support' for case always error and V for another case.

Yes, you are right. No support is ISO SQL sentence. I can use "empty set" sign for this case, it's more clear.

1. About SQLite INT & REAL transparent:

* SQLite INT/REAL => bytea: It should be 'V', the cast behavior: SQLite INT/REAL => Cast by sqlite API => C bytes => Postgres bytea

I have described this case here. It's possible to show SQLite int as bytea, but this presentation isn't recommended by ISO SQL as detail of implementation. Yes, Your conception isn't look like current behaviour, but can be usefully for learning SQLite internals for students. I approve Your algorithm.

* SQLite INT/REAL => uuid: It should be 'V', the cast behavior: SQLite INT/REAL => Cast by sqlite API => C string => Postgres input function => Postgres uuid

SQLite int and real is not longer than 8 byte, uuid is only 16byte value. Hence all possible input will be improper for uuid.

2. About SQLite BLOB transparent:

It does not correct at conversion to date, json, name, text, timestamp, timestamptz, varchar. It should be 'V' instead of 'x':

contrib_regression=# select * from c;
          c1          
----------------------
 \xe38386e382b9e38388
(1 row)

contrib_regression=# alter foreign table c alter column c1 type text;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
   c1   
--------
 テスト
(1 row)

Please note here was interpretation of BLOB affinity data as text data with PostgreSQL encoding of current database! We must add this important non ISO SQL aspect in documentation. Also WHERE (PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text or blob in SQLite.

contrib_regression=# alter foreign table c alter column c1 type json;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
ERROR:  invalid input syntax for type json
DETAIL:  Token "テスト" is invalid.
CONTEXT:  JSON data, line 1: テスト
contrib_regression=# alter foreign table c alter column c1 type date;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
ERROR:  invalid input syntax for type date: "テスト" 

As you can see, the error just because the text presentation is not match with target type. If the blob column has expected data for text type, the conversion is OK:

sqlite> .schema c
CREATE TABLE c (c1 blob);
sqlite> DELETE FROM c;
sqlite> INSERT INTO c VALUES ('2023-01-01');

contrib_regression=# alter foreign table  c alter column c1 type date;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
     c1     
------------
 2023-01-01

Yes, with cast to encoding of current PostgreSQL database all this cases сan be marked as V (transparent transformation), but I added new T value "cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable". Max length of BLOB data in SQLite is very big and also applicable to text, hence sometimes we can check BLOB or text length, especially for types from datatime family, uuid, name, 'bool', 'int', maybe also for varchar with a determined length. Now all values are fetched even if obviously malformed because are very long.

3. About SQLite TEXT:

In my understanding sqlite_fdw does not support cast (raise ERROR) only when execute cast from SQLITE TEXT to Postgres int2, int4, int8, bool, float4, float8, numeric and bytea.

Yes, it's normal for ISO SQL:2016 and in my opinion.

And sqlite API may return TEXT type for BLOB column.

As SQLite C language text-like output SQLite BLOB data really can be more flexible for next transformations. But for PostgreSQL C function this input will not obvious. Yes, text and blob affinity usually have equal size limits, but there are many blobs with no available text presentation in current encoding of PostreSQL database. As example UTF-8 have improper byte sequences cast result can be malformed.

However, the description of these types is confused, '-' for float4, float8; 'x' for int2, int4; 'V+' for numeric...

Marked as V+, but most of well-formed values will store as values with real affinity. Some hex or octal integer numbers can have text affinity.

About: SQLite TEXT -> date, json, numeric, time,... (V+). The cast behavior is the same: SQLite TEXT -> C string -> Postgres input function -> Postgres data

The behavior is same as SQLite TEXT -> text(✔), varchar(✔), name (V) but the description is not same.

In ISO SQL varchar without length and text is the nearest equivalent of text affinity, but name always have length and formally can be truncated. I meant only this difference.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Apr 5, 2023

About SQLite BLOB transparent

As you can see, the error just because the text presentation is not match with target type.

Not only presentation, @nxhai98, but also encoding! Please note, Unicode databases like UTF-8/UTF-16/UTF-32 isn't only possible. Yes, SQLite recommends threat all text as Unicode data, but we have no such recommendations about BLOBs .
Also WHERE (PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text or blob in SQLite. Can we conceptually write flexible WHERE PostgreSQL data to SQLite query transformation with searching applicable data in both BLOB and text forms?

@nxhai98
Copy link
Contributor

nxhai98 commented Apr 7, 2023

Thank for the feedback @mkgrgis,

My table from previous URL is about future behaviour and mainly (but not only) about SELECT. I think You are also discussing about this.

Yes, correct.

I have described this case #66 (comment). It's possible to show SQLite int as bytea, but this presentation isn't recommended by ISO SQL as detail of implementation. Yes, Your conception isn't look like current behaviour, but can be usefully for learning SQLite internals for students. I approve Your algorithm.

Thank for explaining. However, SQLite INT/REAL => bytea still 'x' in the datatype table. Could you update this.

SQLite int and real is not longer than 8 byte, uuid is only 16byte value. Hence all possible input will be improper for uuid

Thank for explaining, I understood.

Please note here was interpretation of BLOB affinity data as text data with PostgreSQL encoding of current database! We must add this important non ISO SQL aspect in documentation. Also WHERE (PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text or blob in SQLite.

Currently, sqlite_fdw support only utf-8 SQLite database and don't consider about current Postgres encoding.

Yes, with cast to encoding of current PostgreSQL database all this cases сan be marked as V (transparent transformation), but I added new T value "cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable". Max length of BLOB data in SQLite is very big and also applicable to text, hence sometimes we can check BLOB or text length, especially for types from datatime family, uuid, name, 'bool', 'int', maybe also for varchar with a determined length. Now all values are fetched even if obviously malformed because are very long.

Currently, sqlite_fdw always uses sqlite3 APIs to cast value to text/int/float and always 'utf-8' for text. So, the T value cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable is not correct.
How about: cast to text in SQLite utf-8 encoding and then transparent transformation if applicable?

As SQLite C language text-like output SQLite BLOB data really can be more flexible for next transformations. But for PostgreSQL C function this input will not obvious. Yes, text and blob affinity usually have equal size limits, but there are many blobs with no available text presentation in current encoding of PostreSQL database. As example UTF-8 have improper byte sequences cast result can be malformed.

sqlite_fdw only get BLOB data directly when postgres column type is bytea, with other data type, BLOB is cast to int/float/text by sqlite3 API first and only UTF-8 supported.
The cast: Sqlite BLOB -> sqlite3_column_text (utf-8: unsigned char *) -> Postgres input function.

Not only presentation, @nxhai98, but also encoding! Please note, Unicode databases like UTF-8/UTF-16/UTF-32 isn't only possible. Yes, SQLite recommends threat all text as Unicode data, but we have no such recommendations about BLOBs .
Also WHERE (PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text or blob in SQLite. Can we conceptually write flexible WHERE PostgreSQL data to SQLite query transformation with searching applicable data in both BLOB and text forms?

Do you mean:

postgres: select * from t2 where bytea_col = 'test';
=> sqlite execute: select * from t2 where bytea_col = 'test';

postgres: select * from t2 where bytea_col = '\x74657374';
=> sqlite execute: select * from t2 where bytea_col = '\x74657374';

If so, his may not feasible on FDW because there is implicit cast in OP: bytea_col = 'test' this corresponding with bytea_col = 'test'::bytea
and the implicit cast is executed before we get this value in FDW (when build sqlite query):

explain verbose select * from tbl where bytea_col = 'test';
                         QUERY PLAN                         
------------------------------------------------------------
 Seq Scan on public.tbl  (cost=0.00..21.00 rows=4 width=64)
   Output: bytea_col, text_col
   Filter: (tbl.bytea_col = '\x74657374'::bytea)
(3 rows)

I check again the table:

  • description of SQLite TEXT => float4/float8 (-), int2/int4/int8 (x) to all V+. With current behavior: sqlite_fdw does not support these cast => it should be (∅) or (-)
  • Also with SQLite TEXT => numeric (V+) and bytea (-), sqlite_fdw does not support these cast => it should be (∅) or (-)
  • The cast SQLite BLOB => float4/float8/int2/int4/int8 (b - show per-bit form) the current behavior (cast behavior of SQLite API) is: SQLite BLOB -> SQLite TEXT -> C number: https://www.sqlite.org/lang_expr.html#castexpr
When casting a BLOB value to INTEGER, the value is first converted to TEXT.

=> this mark b - show per-bit form is not correct -> it should be same as another cast of SQLite BLOB (T).

  • The cast SQLite BLOB -> numeric (∅) => it should be (V+): the cast behavior: SQLite BLOB -> SQLite TEXT (sqlite3_column_text()) -> input function of postgres

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Apr 7, 2023

Thank for explaining. However, SQLite INT/REAL => bytea still 'x' in the datatype table. Could you update this.

Yes, fixed, @nxhai98!

Currently, sqlite_fdw support only utf-8 SQLite database and don't consider about current Postgres encoding.

It's important don't consider about current Postgres encoding! UTF-8 output of SQLite is not problem, also UTF-16 SQLite databases is not problem also. What if there will be some EUC_JP, EUC_CN or WIN866 PostgreSQL? Maybe we should mark all text output of SQLite as UTF-8 input for PostgreSQL?

Currently, sqlite_fdw always uses sqlite3 APIs to cast value to text/int/float and always 'utf-8' for text. So, the T value cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable is not correct. How about: cast to text in SQLite utf-8 encoding and then transparent transformation if applicable?

More clear is

... cast to text in SQLite utf-8 encoding, then to PostgreSQL text with current encoding of database* and then transparent transformation if applicable ...

sqlite_fdw only get BLOB data directly when postgres column type is bytea, with other data type, BLOB is cast to int/float/text by sqlite3 API first and only UTF-8 supported. The cast: Sqlite BLOB -> sqlite3_column_text (utf-8: unsigned char *) -> Postgres input function.

Thanks. So, SQLite blob as text will be utf-8. It's good

... Can we conceptually write flexible WHERE PostgreSQL data to SQLite query transformation with searching applicable data in both BLOB and text forms?

Do you mean:

postgres: select * from t2 where bytea_col = 'test';
=> sqlite execute: select * from t2 where bytea_col = 'test';

postgres: select * from t2 where bytea_col = '\x74657374';
=> sqlite execute: select * from t2 where bytea_col = '\x74657374';

No, I meant for both cases something like
sqlite execute: select * from t2 where bytea_col = '\x74657374' or bytea_col = 'test';
Because for non-STRICT tables real data can be text or BLOB and we have no guarantee.
For bytea search not sqlite execute: select * from t2 where bytea_col = '\x74657374'; , but sqlite execute: select * from t2 where cast(bytea_col as blob) = '\x74657374';
For text search not sqlite execute: select * from t2 where bytea_col = 'test'; but > sqlite execute: select * from t2 where cast (bytea_col as text) = 'test';

If so, his may not feasible on FDW because there is implicit cast in OP: bytea_col = 'test' this corresponding with bytea_col = 'test'::bytea and the implicit cast is executed before we get this value in FDW (when build sqlite query):

explain verbose select * from tbl where bytea_col = 'test';
                         QUERY PLAN                         
------------------------------------------------------------
 Seq Scan on public.tbl  (cost=0.00..21.00 rows=4 width=64)
   Output: bytea_col, text_col
   Filter: (tbl.bytea_col = '\x74657374'::bytea)
(3 rows)

Yes, I am about similar problem.

I check again the table:

* description of SQLite TEXT => float4/float8 (-), int2/int4/int8 (x) to all V+. With current behavior: sqlite_fdw does not support these cast => it should be (∅) or (-)

Marked as -, thanks.

* Also with SQLite TEXT => numeric (V+) and bytea (-), sqlite_fdw does not support these cast => it should be (∅) or (-)

Fixed. for numeric, - for bytea. We can thereat SQLite text as UTF-8 text blob.

* The cast SQLite BLOB => float4/float8/int2/int4/int8 (b - show per-bit form) the current behavior (cast behavior of SQLite API) is: SQLite BLOB -> SQLite TEXT -> C number: https://www.sqlite.org/lang_expr.html#castexpr
* The cast SQLite BLOB -> numeric (∅) => it should be (V+): the cast behavior: SQLite BLOB -> SQLite TEXT (sqlite3_column_text()) -> input function of postgres

Fixed. Please carefully review the table again, @nxhai98. I'll come back at the end of next week after studying about testing sqlite_fdw in PostgreSQL source code tree.

@nxhai98
Copy link
Contributor

nxhai98 commented Apr 12, 2023

@mkgrgis Thank for fixing, I confirmed the table.

It's important don't consider about current Postgres encoding! UTF-8 output of SQLite is not problem, also UTF-16 SQLite databases is not problem also. What if there will be some EUC_JP, EUC_CN or WIN866 PostgreSQL? Maybe we should mark all text output of SQLite as UTF-8 input for PostgreSQL?

I understood there is a problem about not match character set. However, sqlite_fdw does not support encoding conversion, it assumes that encoding of SQLite and Postgres are the same.

If you want to fix it, here is a suggestion:

Postgres's core provides a function:

/*
 * Convert src string to another encoding (general case).
 *
 * See the notes about string conversion functions at the top of this file.
 */
unsigned char *
pg_do_encoding_conversion(unsigned char *src, int len, int src_encoding, int dest_encoding);

It can apply as below to sqlite_fdw:

char *utf8 = (char *) sqlite3_column_text(stmt, colid); // <-- assumes SQLite text is always UTF8
char *valstr = pg_do_encoding_conversion(utf8, strlen(utf8), PG_UTF8, GetDatabaseEncoding());        // <-- convert from utf8 to current database encoding

value_datum = InputFunctionCall(&attinmeta->attinfuncs[attnum],
                valstr,
                attinmeta->attioparams[attnum],
                attinmeta->atttypmods[attnum]);

But it is not a complete solution.
Ideally, sqlite_fdw needs to know encoding of both PostgreSQL and SQLite, and converts data if they are different.
It also required encoding conversion when build query with text constant (in sqlite_deparse_const() function).

No, I meant for both cases something like
sqlite execute: select * from t2 where bytea_col = '\x74657374' or bytea_col = 'test';
Because for non-STRICT tables real data can be text or BLOB and we have no guarantee.
For bytea search not sqlite execute: select * from t2 where bytea_col = '\x74657374'; , but sqlite execute: select * from t2 where cast(bytea_col as blob) = '\x74657374';
For text search not sqlite execute: select * from t2 where bytea_col = 'test'; but > sqlite execute: select * from t2 where cast (bytea_col as text) = 'test';

I understood your spec, but it seems not feasible because of we do not know when bytea_col is compared with text constant.
How do you think about this spec below:

For bytea search: treat BLOB column as BYTEA column, we add the implicit cast for all blob columns when build SQLite query (deparse):

Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = '\x74657374';
-> sqlite: SELECT CAST(bytea_col AS BLOB) FROM t2 WHERE CAST(bytea_col AS BLOB) = x'74657374';

For text search: treat BLOB column as TEXT column, user must be define BLOB column as text column in foreign table and add implicit cast for all text columns when build SQLite query (deparse):

CREATE FOREIGN TABLE t2 (bytea_col TEXT) SERVER sqlite_svr;  <- define BLOB column as TEXT column on postgres
Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = 'test';
-> sqlite: SELECT CAST(bytea_col AS TEXT) FROM t2 WHERE CAST(bytea_col AS TEXT) = 'test';

Here is an example:
SQLite:

sqlite> CREATE TABLE BLOB_TBL(c1 BLOB);
sqlite> INSERT INTO BLOB_TBL VALUES (x'e38386e382b9e38388');
sqlite> INSERT INTO BLOB_TBL VALUES ('test');
sqlite> INSERT INTO BLOB_TBL VALUES (x'74657374');;
sqlite> SELECT c1, typeof(c1) FROM BLOB_TBL;
テスト|blob
test|text
test|blob

PostgreSQL:

test=# CREATE FOREIGN TABLE "BLOB_TBL" (c1 text) SERVER sqlite_svr ;  <- define BLOB column as TEXT column
CREATE FOREIGN TABLE
test=# select * from "BLOB_TBL";
   c1   
--------
 テスト
 test
 test
(3 rows)
test=# explain verbose select * from "BLOB_TBL" where c1 = 'test';
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Foreign Scan on public."BLOB_TBL"  (cost=10.00..7.00 rows=7 width=32)
   Output: c1
   SQLite query: SELECT CAST (`c1` AS TEXT) FROM main."BLOB_TBL" WHERE ((CAST (`c1` AS TEXT) = 'test'))
(3 rows)

test=# select * from "BLOB_TBL" where c1 = 'test';
  c1  
------
 test    <- this is TEXT type on SQLite
 test    <- this is BLOB type on SQLite
(2 rows)

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Apr 21, 2023

Thanks, @nxhai98 ! I am sorry for waiting.

@mkgrgis Thank for fixing, I confirmed the table.

Let's use this table ;-)

char *utf8 = (char *) sqlite3_column_text(stmt, colid); // <-- assumes SQLite text is always UTF8
char *valstr = pg_do_encoding_conversion(utf8, strlen(utf8), PG_UTF8, GetDatabaseEncoding());        // <-- convert from utf8 to current database encoding

value_datum = ....

Yes, this is full SELECT solution I meant, @nxhai98 ! And also we need something like this in sqlite_deparse_const() for WHERE solution, You are completely right! We have guarantees for always and only UTF-8 for sqlite3_column_text output independent on PRAGMA encoding in SQLite. Hence we have only PostgreSQL encoding problems, no similar in SQLite.

Also important function is sqlite3_column_bytes for empty text and empty blob data in SQLite that treated as special data transformation case.

Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = '\x74657374';
-> sqlite: SELECT CAST(bytea_col AS BLOB) FROM t2 WHERE CAST(bytea_col AS BLOB) = x'74657374';
CREATE FOREIGN TABLE t2 (bytea_col TEXT) SERVER sqlite_svr;  <- define BLOB column as TEXT column on postgres
Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = 'test';
-> sqlite: SELECT CAST(bytea_col AS TEXT) FROM t2 WHERE CAST(bytea_col AS TEXT) = 'test';

Nice! Your elegant solution is better than my mechanical solution. I consider this proposals and tests as SQL:2016 recommended for our cases.

How we will begin a draft for the table, @nxhai98 ?

I hope some my drafts will be usefully for full data transformation implementation:

  1. static const char* sqlite_datatype(int t)
  2. Text affinity and datatype variables
  3. Human readable message about runtime data transformation error

What do You think about algorithm structure? Maybe it will be usefully to pack all transformation table logic only in switch (pgtyp) without any pre-operations and write special small runtime_data_transformation_error function?

@nxhai98
Copy link
Contributor

nxhai98 commented Apr 25, 2023

@mkgrgis, thank for your fb,

What do You think about algorithm structure? Maybe it will be usefully to pack all transformation table logic only in switch (pgtyp) without any pre-operations and write special small runtime_data_transformation_error function?

Yes, I think it may better, I see your draft, sqlite_convert_to_pg looks complicated, pack all logical on switch (pgtyp) should be easier to understand spec.

BTW, we have just pushed the bug fixing related to sqlite_convert_to_pg function, could you check and merge them to your implementation?

@mkgrgis
Copy link
Contributor Author

mkgrgis commented May 1, 2023

Thanks, @nxhai98 ! Your contribution in #71 is very interesting. I am learning the changes. I'll create a new branch based on this first with little C-code changes without any behaviour modifications and second with step-by-step PRs with small behaviour changes by our table.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented May 2, 2023

@t-kataym and @nxhai98, I have made a PR without changing SQL behaviour to prepare implementation of data transformation table here discussed. Please review #74.
My plan for next little PRs:

  1. Add separate runtime error function for future cases with impossible data converting
  2. Add text transformation to respect PostgreSQL database encoding
  3. Many line-by-line little PRs for the data transformation table here discussed

@mkgrgis
Copy link
Contributor Author

mkgrgis commented May 18, 2023

Thanks @t-kataym and @nxhai98 ! I have made PR #75 with separate function for data type error message.
My plan for next little PRs:

  1. Add text transformation to respect PostgreSQL database encoding (this behaviour doesn't have a tests now)
  2. Many line-by-line little PRs for the data transformation table.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented May 26, 2023

Thanks @t-kataym and @nxhai98 ! I have made PR #76 with text encoding transformation.

My next PRs will be about implementation of data transformation table.

@t-kataym
Copy link
Contributor

@mkgrgis Thank you for your development.
Please request @nxhai98 to review the changes on PR #76 if you finished.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented May 26, 2023

@t-kataym, how can I request @nxhai98 to review my PR? I can only write here my request for @nxhai98, but not something more. I think reviewers for a PR can be assigned only by repository owners.

@t-kataym
Copy link
Contributor

@mkgrgis Could you describe a comment in PR #76 after finished? It seems that you are in progress because test cases has not been added yet.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented May 30, 2023

@t-kataym, can anybody help me with testing conception(plan) in #76 (comment) ? I am frustrated during adding a new tests with database encodings.

@t-kataym
Copy link
Contributor

@mkgrgis Could you ask @bichht0608 about testing?

@mkgrgis
Copy link
Contributor Author

mkgrgis commented May 30, 2023

@mkgrgis Could you ask @bichht0608 about testing?

Yes, @t-kataym . I have #76 (comment) asked.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Jun 29, 2023

@t-kataym in #76 extended tests after review by @bichht0608 are ready. When should I get opinion of pgspider team?

In next PR there will implemented formally switch in switch grid by affinity-datatype table without SQL behaviour change.

In other PR after next PR there will be new tests and new implementation according some cells of the table.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Jul 13, 2023

@t-kataym , I have implemented new PR with formally switch in switch grid by affinity-datatype table without SQL behaviour change.

In next PRs some elements of the table will be implemented and discussed.

@mkgrgis mkgrgis mentioned this issue Jul 24, 2023
@t-kataym
Copy link
Contributor

@mkgrgis
I'm sorry for the late response.

@t-kataym , I have implemented #79 with formally switch in switch grid by affinity-datatype table without SQL behaviour change.

Our member will confirm it and comment in the PR.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Jul 29, 2023

Our member will confirm it and comment in the PR.

Thanks, @t-kataym . Now I have a little drafts for full datatype-affinity tests and architecture questions about some cell of the table. I hope we will discuss about it after closing of #79.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Aug 9, 2023

Our member will confirm it and comment in the PR.

@t-kataym , still no reaction and unknown perspectives...

I have moved #79 contribution to #82 as a part of UUID support. Also I have restored with a new tests bit and varbit support in #83. This was in first releases but without test. In current release this support doesn't works and have been reduced to some fragments of code.

The name of this issue is

Many problems with formal text affinity values in foreign tables

I want to know your opinion about common problem of any SQLite text affinity data.

For ex. FDW can read mixedcase text UUID, but PostgreSQL gives only normalised UUID form for binding. Hence we have some data, visible for SELECT, but invisible for WHERE. This is applicable to current UUID support implementation. My support for text affinity UUID also isn't ISO:SQL (BLOB-affinity UUID support is) and ugly but better than there is now. We cannot disallow pushdowns, because table data in PostgreSQL buffer can reduce productivity for large SQLite files. Only for = and !=, not for > and < it can be special deparsing conception. What do you think about main theses of this conception? For ex. how to implement SQLite UUID comparing with PostgreSQL UUID value without - and in lowercase form.

Other ex. PostgreSQL IS NaN can be deparsed to SQLite = 'NaN'. PostgreSQL input also can read Nan nan or nAn, but this values will be invisible for WHERE, because PostgreSQL will deparse only NaN form.

3rd example. SQLite text boolean where Y y Yes YES t T true True TRUE etc equals to 1::bool. There is also simple SELECT but many invisible data for WHERE.

I hope this discussion will be continued this month, @t-kataym .

@t-kataym
Copy link
Contributor

I'm sorry. I cannot spend enough time to this topic. It is not high priority for us.

I want to know your opinion about common problem of any SQLite text affinity data.

My basic though is that:

  • Basically, text affinity data in SQLite should be mapped to text column in PostgreSQL.
  • In case of a conversion of text affinity data in SQLite to non-text type in PostgreSQL, if it is a simple case which can define specification and implement it, it is acceptable to support it on SQLite FDW.
  • But it is complex case, I would like not to support it on SQLite FDW. I expect user side (who issues SQL to PostgreSQL) to convert it.

@t-kataym t-kataym added the low priority Not important feature or issue for us. It may need a discussion but not done. label Aug 10, 2023
@mkgrgis
Copy link
Contributor Author

mkgrgis commented Sep 6, 2023

A new information about Infinity in SQLite, see https://stackoverflow.com/questions/72113935/how-do-i-select-floating-point-infinity-literals-from-a-sqlite-database
In SQLite SELECT 9e999; gives Inf literal and PostgreSQL ±Inf can be deparsed to ±9e999. SELECT typeof(-9e999); gives real.
This logic will be implemented in future PR after UUID, bit/varbit and bool support.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Oct 27, 2023

@t-kataym , now my PR #83 is ready for review after rebase to git mainstream. This PR implements limited to 64 bits length bit and varbit support as non-text operations in SQLite. This datatypes will be not text but int affinity in SQLite during sqlite_fdw.

After this PR I am implementing ISO:SQL mixed affinity bool support by uuid mixed affinity support sample .

@t-kataym
Copy link
Contributor

@mkgrgis , Thank you for your contribution. Yes, we will confirm the PR.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Dec 27, 2023

The next milestone is infinity-like data for both float and double precision. In SQLite we can meet both text affinity value like -Inf or real affinity value for infinity as result of SELECT 9e999;.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Apr 16, 2024

@t-kataym , after Infinity support in #97 I'll make a PR about MAC address support for both 6 and 8 bytes forms (there is actual tested draft). Now all of this cases treated as a values with text affinity and have not ISO:SQL data processing.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Jul 2, 2024

Now there are no problems with data with text affinity for numeric, float4 and float8 columns. After PR with macaddr support this issue will be closed, IP address as PostgreSQL data type is not planned for implementation. WKT GIS data, which also have text affinity, will be implemented in 2nd GIS PR.

@mkgrgis
Copy link
Contributor Author

mkgrgis commented Dec 23, 2024

After PR with macaddr support this issue will be closed

Now ISO:SQL data processing for a SQLite data with text affinity is supported for all of supported PostgreSQL data types in sqlite_fdw. Universal IP addresses (v4+v6) support is not planned now, GIS WKT is independent problem.

@mkgrgis mkgrgis closed this as completed Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low priority Not important feature or issue for us. It may need a discussion but not done.
Projects
None yet
Development

No branches or pull requests

3 participants