Support for VARBINARY columns #131

jgoizueta · 2020-10-26T10:21:34Z

Implement support for reading VARBINARY values without conversion (using target type SQL_C_BINARY).

I tried also to clean-up the read-in-chunks logic for variable-size data (both text & binary).
We don't request the ODBC driver to convert the binary data, but we still convert it into an hex text string (\x....) because BuildTupleFromCStrings is used to put the data into a tuple. I leave it for a future optimization to avoid this conversion (saving time and space).

jgoizueta · 2020-10-27T17:42:12Z

Unfortunately this is broken on Windows. The new test for reading variable data larger than the chunk size fails with:

ERROR:  Reading data
[Microsoft][ODBC Driver Manager] Invalid argument value

So this will have to wait until I can debug it on Windows 😞

The size of the text needs to be incremented over 8192 MAXIMUM_BUFFER_SIZE in order to test multi-chunk reading.

jgoizueta · 2020-10-28T08:39:15Z

Working on Windows too 😅

jgoizueta · 2020-10-28T08:44:31Z

Summary of what this PR does:

It cleans up a little the usage of SQLGetData to read variable size data in chunks. Binary data to be read into a bytea column used to be read with conversion (performed by the ODBC driver) to text (target type: SQL_C_CHAR) which, for the PG driver converted data to a hexadecimal text string, but didn't work for some drivers. Now it is read raw (target type: SQL_C_BINARY), which means no trailing zeros, and we take care of conversion to an hex string for insertion in the resulting tuple.

Algunenano

In general it looks fine to me. There are many things that I understand are irks from the ODBC standard and their different implementations, so I trust you with that, but I've also left some comments in the code.

Also, maybe it's time to start moving things around and break things into different files with an enforce code style?

Algunenano · 2020-10-28T15:21:07Z

odbc_fdw.c

+    SQLCHAR sqlstate[5];
+    GetDataTruncation truncation = NO_TRUNCATION;
+    if (ret == SQL_SUCCESS_WITH_INFO)
+    {


The different styles (tabs and spaces) makes this harder to understand, specially when Github doesn't use the same 1 tab == x spaces as your editor. I would suggest one or the other and using something like clang-format to enforce it in your commits.

I'm sorry I was messy with that, I'll try to fix my changes using tabs (I think that's we've been using mostly here) and I'll leave fixes in the resto of the file for a separate PR.

Algunenano · 2020-10-28T15:36:20Z

odbc_fdw.c

+static GetDataTruncation
+result_truncation(SQLRETURN ret, SQLHSTMT stmt)
+{
+    SQLCHAR sqlstate[5];


Shouldn't this be 6 (5 + NULL)?

SQLState [Output] Pointer to a buffer in which to return a five-character SQLSTATE code (and terminating NULL) for the diagnostic record RecNumber. The first two characters indicate the class; the next three indicate the subclass. This information is contained in the SQL_DIAG_SQLSTATE diagnostic field. For more information, see SQLSTATEs.

Ouch, good catch, it must have been working by chance: the next allocated element in the stack frame contains a zero value (NO_TRUNCATION which must have been overwritten with a zero byte) before the string comparisons were made. 🙄

Algunenano · 2020-10-28T15:37:53Z

odbc_fdw.c

+    if (ret == SQL_SUCCESS_WITH_INFO)
+    {
+		SQLGetDiagRec(SQL_HANDLE_STMT, stmt, 1, sqlstate, NULL, NULL, 0, NULL);
+        if (strcmp((char*)sqlstate, ODBC_SQLSTATE_STRING_TRUNCATION) == 0)


Maybe use strncmp instead of strcmp for more safety.

Agreed (although you already spoiled the fun of buffer-overflowing in your previous comment, bah)

Algunenano · 2020-10-28T15:39:17Z

odbc_fdw.c

-				SQLGetDiagRec(SQL_HANDLE_STMT, stmt, 1, sqlstate, NULL, NULL, 0, NULL);
-				if (strcmp((char*)sqlstate, ODBC_SQLSTATE_FRACTIONAL_TRUNCATION) == 0)
+				resize_buffer(&buffer, &buffer_size, used_buffer_size, used_buffer_size + chunk_size);
+				ret = SQLGetData(stmt, i, target_type, buffer + used_buffer_size, chunk_size, &result_size);


Isn't this missing error handling? result_truncation only does something if it was ok, but no error is raised if there was an error here.

The error checking is performed further below (check_return), first we handle non-error cases 🤔 which now I see is not very elegant... but I'll leave that for when I take on the TODO comment associated with said error checking.

Algunenano · 2020-10-28T15:41:55Z

odbc_fdw.c

+static void
+resize_buffer(char ** buffer, int *size, int used_size, int required_size)
+{
+    if (required_size > *size)


Why not use repalloc? Am I missing something?

The whole function brings a lot of complexity that I'm not sure should be here at all, like only freeing the buffer if used_size because it's relying on buffer being NULL (not allocated).

No, it was me who was missing the existence of repalloc. I'll review this later if I have time

odbc_fdw.c

Algunenano

LGTM. Are you leaving the repalloc for later?

jgoizueta · 2020-10-30T07:18:23Z

Uhm, yes, I'd rather release this which is working now, then refactor buffer resize properly when I have a little time

jgoizueta added 2 commits October 27, 2020 18:06

Support for VARBINARY columns

913f3c4

Add test for reading long variable-size data

56b0384

jgoizueta force-pushed the feature/ch113672/implement-proper-binary-field-reading-in branch from 1b9a985 to 56b0384 Compare October 27, 2020 17:08

jgoizueta added 2 commits October 27, 2020 20:45

Set chunk_size after adjusting col_size

ca97720

Fix test

d84547f

The size of the text needs to be incremented over 8192 MAXIMUM_BUFFER_SIZE in order to test multi-chunk reading.

jgoizueta requested a review from Algunenano October 28, 2020 06:52

Algunenano suggested changes Oct 28, 2020

View reviewed changes

jgoizueta added 5 commits October 28, 2020 18:54

Fix usage of spaces for indentation

81b8850

Fix sqlstate size

bbd4a8e

Safer string comparisons

7f456b3

Refactor loop

047a61a

Turn notice into debug message

7dfb22a

jgoizueta requested a review from Algunenano October 28, 2020 18:40

Algunenano approved these changes Oct 29, 2020

View reviewed changes

Base automatically changed from getdata-errors to master October 30, 2020 07:16

jgoizueta merged commit dccc4c5 into master Oct 30, 2020

jgoizueta deleted the feature/ch113672/implement-proper-binary-field-reading-in branch October 30, 2020 07:18

jgoizueta mentioned this pull request Nov 7, 2020

Interpret 01000 sqlstate as string truncation #138

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for VARBINARY columns #131

Support for VARBINARY columns #131

jgoizueta commented Oct 26, 2020

jgoizueta commented Oct 27, 2020

jgoizueta commented Oct 28, 2020

jgoizueta commented Oct 28, 2020

Algunenano left a comment

Algunenano Oct 28, 2020

jgoizueta Oct 28, 2020

Algunenano Oct 28, 2020

jgoizueta Oct 28, 2020

Algunenano Oct 28, 2020

jgoizueta Oct 28, 2020

Algunenano Oct 28, 2020

jgoizueta Oct 28, 2020

Algunenano Oct 28, 2020

Algunenano Oct 28, 2020

jgoizueta Oct 28, 2020

Algunenano left a comment

jgoizueta commented Oct 30, 2020

Support for VARBINARY columns #131

Support for VARBINARY columns #131

Conversation

jgoizueta commented Oct 26, 2020

jgoizueta commented Oct 27, 2020

jgoizueta commented Oct 28, 2020

jgoizueta commented Oct 28, 2020

Algunenano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Algunenano left a comment

Choose a reason for hiding this comment

jgoizueta commented Oct 30, 2020