-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement create_base_type API. #206
Conversation
pgtle_admin can run create_base_type API to create a new base type (providing I/O funcs and other required arguments) in pg_tle. User-defined input function must accept TEXT and return BYTEA, we make an C-version input function that accepts CSTRING and returns the new TYPE, the C-version input function will be used internally to create the type; inside the C-version input function, we find the user-defined input function and execute it. A dependency between C-version function and user-defined function is recorded in pg_depend, so that DROP FUNC ... CASCADE can work as expected. Output function works in a similar way as the input function. Future improvements: 1. The newly-created type can set to NULL value, and it reads/outputs NULL without invoking the user-defined I/O functions. In Postgres `CREATE TYPE` experience, input function is still called when reading a NULL input value, but it must still return NULL; output function are not invoked for NULL values. We may want to align the behavior for NULL values, but it seems to be a minor detail for now. 2. Other base type features (e.g. send/receive functions, storage, collation) are not supported in this version. Co-authored-by: Jim Mlodgenski <jimmy76@gmail.com>
src/datatype.c
Outdated
@@ -56,7 +75,7 @@ static bool | |||
create_shell_type(Oid typeNamespace, const char *typeName, bool if_not_exists) | |||
{ | |||
AclResult aclresult; | |||
Oid typoid; | |||
Oid typeoid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if this matches upstream convention (I have not checked) -- if the code is relying on upstream, I'd suggest matching that. Otherwise, no comments on this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems both typeOid
and typoid
are used https://github.com/postgres/postgres/blob/0d369ac650041862ed5006885160f36d24b224a4/src/backend/commands/typecmds.c#LL1340C2-L1340C15
Oid typeOid;
Oid typoid;
s/typeoid/typeOid because typeOid
seems more common
src/datatype.c
Outdated
#if PG_VERSION_NUM >= 130000 | ||
char alignment = TYPALIGN_INT; /* default alignment */ | ||
char storage_plain = TYPSTORAGE_PLAIN; /* default TOAST storage | ||
* method */ | ||
char storage_extended = TYPSTORAGE_EXTENDED; /* default ARRAY TOAST | ||
* storage method */ | ||
#else | ||
char alignment = 'i'; /* default alignment */ | ||
char storage_plain = 'p'; /* default TOAST storage method */ | ||
char storage_extended = 'x'; /* default ARRAY TOAST storage method */ | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be refactored into compatibility.h
, e.g.:
#if PG_VERSION_NUM < 130000
#define TYPALIGN_INT 'i'
#define TYPSTORAGE_PLAIN 'p'
#define TYPSTORAGE_EXTENDED 'x'
#endif
This would remove the need for the macro #ifs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Fixed.
src/datatype.c
Outdated
* types) in ArrayType and in composite types in DatumTupleFields. This | ||
* oid must be preserved by binary upgrades. | ||
*/ | ||
#if PG_VERSION_NUM >= 140000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly -- is there a way to put this into compatibility.h
or the like, so there is only one call needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an elegant way to put this into compatibility.h
. TypeCreate(...)
is used twice, one for base type creation, one for array type creation, a few arguments are different between the two calls. Also, TypeCreate
requires many arguments which is already error-prone ... Let me know if you have a better idea on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I provided an example with a different function in this review. See if that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to ensure we have a check that the user-provided function is immutable.
src/datatype.c
Outdated
#if PG_VERSION_NUM >= 120000 | ||
typeOid = GetSysCacheOid2(TYPENAMENSP, Anum_pg_type_oid, | ||
CStringGetDatum(typeName), | ||
ObjectIdGetDatum(typeNamespace)); | ||
#else | ||
typeOid = GetSysCacheOid2(TYPENAMENSP, | ||
CStringGetDatum(typeName), | ||
ObjectIdGetDatum(typeNamespace)); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be easier to read as a macro in compatibility.h:
#if PG_VERSION >= 120000
#define GETSYSCACHEOID2(a,b,c) GetSysCacheOid2(a, Anum_pg_type_oid, b, c);
#else
#define GETSYSCACHEOID2(a,b,c) GetSysCacheOid2(a, b, c);
#endif
Then the code can become:
#if PG_VERSION_NUM >= 120000 | |
typeOid = GetSysCacheOid2(TYPENAMENSP, Anum_pg_type_oid, | |
CStringGetDatum(typeName), | |
ObjectIdGetDatum(typeNamespace)); | |
#else | |
typeOid = GetSysCacheOid2(TYPENAMENSP, | |
CStringGetDatum(typeName), | |
ObjectIdGetDatum(typeNamespace)); | |
#endif | |
typeOid = GETSYSCACHEOID2(TYPENAMENSP, | |
CStringGetDatum(typeName), | |
ObjectIdGetDatum(typeNamespace)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks for the suggestion.
src/datatype.c
Outdated
else if (if_not_exists) | ||
{ | ||
ereport(NOTICE, | ||
(errcode(ERRCODE_DUPLICATE_OBJECT), | ||
errmsg("type \"%s\" already exists, skipping", typeName))); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the if_not_exists
is supposed to fail quietly, the ereport
won't do that. It will abort the transaction. You just want to return false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTICE shouldn't fail right?
Maybe we should just LOG it instead of NOTICE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this won't abort the transaction, as the report level is NOTICE
, similarly https://github.com/postgres/postgres/blob/master/src/backend/commands/createas.c#L421
src/datatype.c
Outdated
if (moveArrayTypeName(typeOid, typeName, typeNamespace)) | ||
typeOid = InvalidOid; | ||
else if (if_not_exists) | ||
{ | ||
ereport(NOTICE, | ||
(errcode(ERRCODE_DUPLICATE_OBJECT), | ||
errmsg("type \"%s\" already exists, skipping", typeName))); | ||
return false; | ||
} | ||
else | ||
ereport(ERROR, | ||
(errcode(ERRCODE_DUPLICATE_OBJECT), | ||
errmsg("type \"%s\" already exists", typeName))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very if/else heavy. Here's an alternative approach:
if (moveArrayTypeName(typeOid, typeName, typeNamespace)) | |
typeOid = InvalidOid; | |
else if (if_not_exists) | |
{ | |
ereport(NOTICE, | |
(errcode(ERRCODE_DUPLICATE_OBJECT), | |
errmsg("type \"%s\" already exists, skipping", typeName))); | |
return false; | |
} | |
else | |
ereport(ERROR, | |
(errcode(ERRCODE_DUPLICATE_OBJECT), | |
errmsg("type \"%s\" already exists", typeName))); | |
if (!moveArrayTypeName(typeOid, typeName, typeNamespace)) | |
{ | |
if (if_not_exists) | |
return false; | |
ereport(NOTICE, | |
(errcode(ERRCODE_DUPLICATE_OBJECT), | |
errmsg("type \"%s\" already exists", typeName))); | |
} | |
typeOid = InvalidOid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks for the suggestion.
src/datatype.c
Outdated
* types) in ArrayType and in composite types in DatumTupleFields. This | ||
* oid must be preserved by binary upgrades. | ||
*/ | ||
#if PG_VERSION_NUM >= 140000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I provided an example with a different function in this review. See if that helps.
src/datatype.c
Outdated
/* | ||
* Call the user-defined output function. | ||
*/ | ||
PG_RETURN_CSTRING(TextDatumGetCString(OidFunctionCall1Coll(output_function, InvalidOid, datum))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd break up this line, in case there are errors on the OidFunctionCall1Coll
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/datatype.c
Outdated
false, /* security */ | ||
false, /* Leak Proof */ | ||
false, /* Strict */ | ||
PROVOLATILE_IMMUTABLE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we validating that the user is providing an immutable function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think we actually should define this attribute based on the user input function; I've changed attributes prosecdef, proleakproof, proisstrict, provolatile, proparallel, procost, prorows
to match the actual user input function.
src/datatype.c
Outdated
false, /* Leak Proof */ | ||
false, /* Strict */ | ||
PROVOLATILE_IMMUTABLE, | ||
PROPARALLEL_SAFE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, are we validating that the user is providing a parallel safe function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment.
src/datatype.c
Outdated
List *inputNames = NIL; | ||
|
||
tuple = SearchSysCache1(PROCOID, | ||
ObjectIdGetDatum(funcid)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something seems off with the indentation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
nice! |
Also, added a max size check for type creation.
Hey Oliver, thanks for the suggestion! The doc is not there yet because the feature is still under development (there will be a few followup changes); once the code changes are mostly completed, we'll add a doc page with examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think this is pretty close now. My major comment is around turning the pgtle.create_base_type_if_not_exists
function into a PL/pgSQL function (I provided a reference to a similar one in the comments). This allows us to remove some more C code, and simplifies some of the implementation details in the other functions.
Outside of that, I think this is good to go.
pg_tle--1.0.4--1.0.5.sql
Outdated
CREATE FUNCTION pgtle.create_base_type_if_not_exists | ||
( | ||
typenamespace regnamespace, | ||
typename name, | ||
infunc regprocedure, | ||
outfunc regprocedure, | ||
internallength int4 | ||
) | ||
RETURNS boolean | ||
SET search_path TO 'pgtle' | ||
STRICT | ||
AS 'MODULE_PATHNAME', 'pg_tle_create_base_type_if_not_exists' | ||
LANGUAGE C; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this couldn't be a PL/pgSQL function similar to some of the others (e.g. https://github.com/aws/pg_tle/blob/main/pg_tle--1.0.0.sql#L389). I don't think there's anything C-specific that we need in this call, and we can push all the hard work down to create_base_type
. This should reduce our surface area for C-specific bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest commit, only create_base_type_if_not_exists
is now defined in C, create_base_type
is defined in plpgsql
and will raise an exception if create_base_type_if_not_exists
returns false.
The reason is that there are multiple cases where we could return ERRCODE_DUPLICATE_OBJECT
, for example, if the shell type doesn't exist when create_baset_type is called (same error as postgres create type would return).
src/datatype.c
Outdated
PG_FUNCTION_INFO_V1(pg_tle_create_base_type_if_not_exists); | ||
Datum | ||
pg_tle_create_base_type_if_not_exists(PG_FUNCTION_ARGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, I think we can just move this into a PL/pgSQL function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See a previous comment
src/datatype.c
Outdated
if (if_not_exists) | ||
{ | ||
ereport(NOTICE, | ||
(errcode(ERRCODE_DUPLICATE_OBJECT), | ||
errmsg("type \"%s\" already exists, skipping", typeName))); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move the ...if_not_exists
to a PL/pgSQL function, we can just return the ERRCODE_DUPLICATE_OBJECT
and not need this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see a previous comment
src/datatype.c
Outdated
* if_not_exists: if true, don't fail on duplicate name, just print a notice and return false. | ||
* Otherwise, fail on duplicate name. | ||
*/ | ||
static bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move the if_not_exists
to a PL/pgSQL function, this can become a static void
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see a previous comment
src/datatype.c
Outdated
* Raise an error if any requirement is not met. | ||
*/ | ||
static void | ||
check_user_input_func(Oid funcid, Oid expectedNamespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the input/output checks are almost identical, subject to a few wording changes. Can we either make the checks a common function, or refactor the code so we only have the check logic once?
For the error message wording, we can say "type function cannot ..." instead of having input/output in it. Or, have a flag that sets the appropriate language of "input" or "output"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, they are mostly the same, updated to check_user_defined_func
instead.
src/datatype.c
Outdated
/* | ||
* Make effects of commands visible | ||
*/ | ||
CommandCounterIncrement(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why we need this? Is the type not visible in the transaction otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, we don't need this actually https://github.com/postgres/postgres/blob/master/src/backend/access/transam/README#L44. It is already called automatically within a transaction.
We only need it if we want to see the effect within the same command (function), for example, if we want to use one C function to do both create_shell_type and create_base_type.
#endif | ||
|
||
/* | ||
* If it's not a shell, see if it's an autogenerated array type, and if so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there somewhere I can read about this autogenerated array type / how that's triggered?
I think the tricky part is I imagine you dug deep into how pg_type.c for this section but it's not clear, esp for those without context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is the same as postgres create type implementation
https://github.com/postgres/postgres/blob/master/src/backend/commands/typecmds.c#L239
https://github.com/postgres/postgres/blob/master/src/backend/commands/typecmds.c#L606
https://www.postgresql.org/docs/current/sql-createtype.html#SQL-CREATETYPE-ARRAY
src/datatype.c
Outdated
else if (if_not_exists) | ||
{ | ||
ereport(NOTICE, | ||
(errcode(ERRCODE_DUPLICATE_OBJECT), | ||
errmsg("type \"%s\" already exists, skipping", typeName))); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTICE shouldn't fail right?
Maybe we should just LOG it instead of NOTICE?
src/datatype.c
Outdated
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION), | ||
errmsg("invalid type internal length %d, maximum size is %d", internalLength, TLE_BASE_TYPE_SIZE_LIMIT))); | ||
|
||
/* As a bytea is of variable length, we need to allow for the header */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments on why/how/where you're referencing this from as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we store everything as bytea which has a header, that's why we add it here.
For example, if a client wants to create a fixed length custom type int2: a 2-element vector of one byte integer value; in theory we only need two bytes (if we use CREATE TYPE
to create it), however, because in the pg_tle implementation, it's stored using bytea, we need to account for the extra header.
…ype_if_not_exists implementation in C. Also, refactor a few type I/O functions into find/check_user_defined_func.
pg_tle--1.0.4--1.0.5.sql
Outdated
SET search_path TO 'pgtle' | ||
AS $_pgtleie_$ | ||
DECLARE | ||
not_exists boolean; | ||
BEGIN | ||
not_exists = pgtle.create_base_type_if_not_exists(typenamespace, typename, infunc, outfunc, internallength); | ||
IF NOT not_exists THEN | ||
RAISE EXCEPTION 'type "%" already exists', typename; | ||
END IF; | ||
END; | ||
$_pgtleie_$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested the opposite -- the C function raises exceptions -- it does not return true/false. Then pgtle.create_base_type
is a C function.
Then pgtle.create_base_type_if_not_exists
becomes:
CREATE FUNCTION pgtle.create_base_type_if_not_exists(
typenamespace regnamespace,
typename name,
infunc regprocedure,
outfunc regprocedure,
internallength int4
)
RETURNS bool
LANGUAGE plpgsql
AS $$
BEGIN
PERFORM pgtle.create_base_type( typenamespace, typename, infunc, outfunc, internallength);
RETURN TRUE;
EXCEPTION
-- only catch the unique violation. let all other exceptions pass through.
WHEN unique_violation THEN
RETURN FALSE;
END;
$$;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't choose this approach is that there are multiple cases where create_base_type
could return ERRCODE_DUPLICATE_OBJECT
.
For example, if create_base_type
is called without creating shell type first, ERRCODE_DUPLICATE_OBJECT
is raised (same error in postgres create base type without shell type); then create_base_type_if_not_exists
would catch this error and return false (it would appear to the user as if the type already exists).
If we want to define create_base_type
in C, we'll need to change ERRCODE_DUPLICATE_OBJECT
to other error code in this place (deviates from postgres CREATE TYPE code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the latest commit
Description of changes:
pgtle_admin can run create_base_type API to create a new base type (providing I/O funcs and other required arguments) in pg_tle.
User-defined input function must accept TEXT and return BYTEA, we make an C-version input function that accepts CSTRING and returns the new TYPE, the C-version input function will be used internally to create the type; inside the C-version input function, we find the user-defined input function and execute it. A dependency between C-version function and user-defined function is recorded in pg_depend, so that
DROP FUNC ... CASCADE
can work as expected.Output function works in a similar way as the input function.
create_base_type
implementation is similar toDefineType
https://github.com/postgres/postgres/blob/master/src/backend/commands/typecmds.c#L148, except pg_tle specific part.Future improvements:
CREATE TYPE
experience, input function is still called when reading a NULL input value, but it must still return NULL; output function are not invoked for NULL values. We may want to align the behavior for NULL values, but it seems to be a minor detail for now.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.