Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for VARBINARY columns #131

Merged

Conversation

jgoizueta
Copy link

Implement support for reading VARBINARY values without conversion (using target type SQL_C_BINARY).

I tried also to clean-up the read-in-chunks logic for variable-size data (both text & binary).
We don't request the ODBC driver to convert the binary data, but we still convert it into an hex text string (\x....) because BuildTupleFromCStrings is used to put the data into a tuple. I leave it for a future optimization to avoid this conversion (saving time and space).

@jgoizueta jgoizueta force-pushed the feature/ch113672/implement-proper-binary-field-reading-in branch from 1b9a985 to 56b0384 Compare October 27, 2020 17:08
@jgoizueta
Copy link
Author

Unfortunately this is broken on Windows. The new test for reading variable data larger than the chunk size fails with:

ERROR:  Reading data
[Microsoft][ODBC Driver Manager] Invalid argument value

So this will have to wait until I can debug it on Windows 😞

The size of the text needs to be incremented over 8192 MAXIMUM_BUFFER_SIZE
in order to test multi-chunk reading.
@jgoizueta jgoizueta requested a review from Algunenano October 28, 2020 06:52
@jgoizueta
Copy link
Author

Working on Windows too 😅

@jgoizueta
Copy link
Author

Summary of what this PR does:

It cleans up a little the usage of SQLGetData to read variable size data in chunks. Binary data to be read into a bytea column used to be read with conversion (performed by the ODBC driver) to text (target type: SQL_C_CHAR) which, for the PG driver converted data to a hexadecimal text string, but didn't work for some drivers. Now it is read raw (target type: SQL_C_BINARY), which means no trailing zeros, and we take care of conversion to an hex string for insertion in the resulting tuple.

Copy link

@Algunenano Algunenano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it looks fine to me. There are many things that I understand are irks from the ODBC standard and their different implementations, so I trust you with that, but I've also left some comments in the code.

Also, maybe it's time to start moving things around and break things into different files with an enforce code style?

odbc_fdw.c Outdated
SQLCHAR sqlstate[5];
GetDataTruncation truncation = NO_TRUNCATION;
if (ret == SQL_SUCCESS_WITH_INFO)
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different styles (tabs and spaces) makes this harder to understand, specially when Github doesn't use the same 1 tab == x spaces as your editor. I would suggest one or the other and using something like clang-format to enforce it in your commits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I was messy with that, I'll try to fix my changes using tabs (I think that's we've been using mostly here) and I'll leave fixes in the resto of the file for a separate PR.

odbc_fdw.c Outdated
static GetDataTruncation
result_truncation(SQLRETURN ret, SQLHSTMT stmt)
{
SQLCHAR sqlstate[5];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be 6 (5 + NULL)?

SQLState
[Output] Pointer to a buffer in which to return a five-character SQLSTATE code (and terminating NULL) for the diagnostic record RecNumber. The first two characters indicate the class; the next three indicate the subclass. This information is contained in the SQL_DIAG_SQLSTATE diagnostic field. For more information, see SQLSTATEs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, good catch, it must have been working by chance: the next allocated element in the stack frame contains a zero value (NO_TRUNCATION which must have been overwritten with a zero byte) before the string comparisons were made. 🙄

odbc_fdw.c Outdated
if (ret == SQL_SUCCESS_WITH_INFO)
{
SQLGetDiagRec(SQL_HANDLE_STMT, stmt, 1, sqlstate, NULL, NULL, 0, NULL);
if (strcmp((char*)sqlstate, ODBC_SQLSTATE_STRING_TRUNCATION) == 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use strncmp instead of strcmp for more safety.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed (although you already spoiled the fun of buffer-overflowing in your previous comment, bah)

SQLGetDiagRec(SQL_HANDLE_STMT, stmt, 1, sqlstate, NULL, NULL, 0, NULL);
if (strcmp((char*)sqlstate, ODBC_SQLSTATE_FRACTIONAL_TRUNCATION) == 0)
resize_buffer(&buffer, &buffer_size, used_buffer_size, used_buffer_size + chunk_size);
ret = SQLGetData(stmt, i, target_type, buffer + used_buffer_size, chunk_size, &result_size);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this missing error handling? result_truncation only does something if it was ok, but no error is raised if there was an error here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error checking is performed further below (check_return), first we handle non-error cases 🤔 which now I see is not very elegant... but I'll leave that for when I take on the TODO comment associated with said error checking.

odbc_fdw.c Outdated
static void
resize_buffer(char ** buffer, int *size, int used_size, int required_size)
{
if (required_size > *size)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use repalloc? Am I missing something?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole function brings a lot of complexity that I'm not sure should be here at all, like only freeing the buffer if used_size because it's relying on buffer being NULL (not allocated).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was me who was missing the existence of repalloc. I'll review this later if I have time

odbc_fdw.c Outdated Show resolved Hide resolved
odbc_fdw.c Outdated Show resolved Hide resolved
odbc_fdw.c Outdated Show resolved Hide resolved
@jgoizueta jgoizueta requested a review from Algunenano October 28, 2020 18:40
Copy link

@Algunenano Algunenano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Are you leaving the repalloc for later?

Base automatically changed from getdata-errors to master October 30, 2020 07:16
@jgoizueta
Copy link
Author

Uhm, yes, I'd rather release this which is working now, then refactor buffer resize properly when I have a little time

@jgoizueta jgoizueta merged commit dccc4c5 into master Oct 30, 2020
@jgoizueta jgoizueta deleted the feature/ch113672/implement-proper-binary-field-reading-in branch October 30, 2020 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants