Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement create_base_type API. #206

Merged
merged 8 commits into from
Jun 30, 2023
Merged

Implement create_base_type API. #206

merged 8 commits into from
Jun 30, 2023

Conversation

lyupan
Copy link
Contributor

@lyupan lyupan commented Jun 16, 2023

Description of changes:

pgtle_admin can run create_base_type API to create a new base type (providing I/O funcs and other required arguments) in pg_tle.

User-defined input function must accept TEXT and return BYTEA, we make an C-version input function that accepts CSTRING and returns the new TYPE, the C-version input function will be used internally to create the type; inside the C-version input function, we find the user-defined input function and execute it. A dependency between C-version function and user-defined function is recorded in pg_depend, so that DROP FUNC ... CASCADE can work as expected.

Output function works in a similar way as the input function.

create_base_type implementation is similar to DefineType https://github.com/postgres/postgres/blob/master/src/backend/commands/typecmds.c#L148, except pg_tle specific part.

Future improvements:

  1. The newly-created type can set to NULL value, and it reads/outputs NULL without invoking the user-defined I/O functions. In Postgres CREATE TYPE experience, input function is still called when reading a NULL input value, but it must still return NULL; output function are not invoked for NULL values. We may want to align the behavior for NULL values, but it seems to be a minor detail for now.
  2. Other base type features (e.g. send/receive functions, storage, collation) are not supported in this version.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Lyu Pan and others added 2 commits June 16, 2023 18:52
pgtle_admin can run create_base_type API to create a new base type (providing I/O funcs and other required arguments) in pg_tle.

User-defined input function must accept TEXT and return BYTEA, we make an C-version input function that accepts CSTRING and returns the new TYPE, the C-version input function will be used internally to create the type; inside the C-version input function, we find the user-defined input function and execute it. A dependency between C-version function and user-defined function is recorded in pg_depend, so that DROP FUNC ... CASCADE can work as expected.

Output function works in a similar way as the input function.

Future improvements:
1. The newly-created type can set to NULL value, and it reads/outputs NULL without invoking the user-defined I/O functions. In Postgres `CREATE TYPE` experience, input function is still called when reading a NULL input value, but it must still return NULL; output function are not invoked for NULL values. We may want to align the behavior for NULL values, but it seems to be a minor detail for now.
2. Other base type features (e.g. send/receive functions, storage, collation) are not supported in this version.

Co-authored-by: Jim Mlodgenski <jimmy76@gmail.com>
src/datatype.c Outdated
@@ -56,7 +75,7 @@ static bool
create_shell_type(Oid typeNamespace, const char *typeName, bool if_not_exists)
{
AclResult aclresult;
Oid typoid;
Oid typeoid;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if this matches upstream convention (I have not checked) -- if the code is relying on upstream, I'd suggest matching that. Otherwise, no comments on this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems both typeOid and typoid are used https://github.com/postgres/postgres/blob/0d369ac650041862ed5006885160f36d24b224a4/src/backend/commands/typecmds.c#LL1340C2-L1340C15

Oid			typeOid;
Oid			typoid;

s/typeoid/typeOid because typeOid seems more common

src/datatype.c Outdated Show resolved Hide resolved
src/datatype.c Outdated
Comment on lines 235 to 245
#if PG_VERSION_NUM >= 130000
char alignment = TYPALIGN_INT; /* default alignment */
char storage_plain = TYPSTORAGE_PLAIN; /* default TOAST storage
* method */
char storage_extended = TYPSTORAGE_EXTENDED; /* default ARRAY TOAST
* storage method */
#else
char alignment = 'i'; /* default alignment */
char storage_plain = 'p'; /* default TOAST storage method */
char storage_extended = 'x'; /* default ARRAY TOAST storage method */
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be refactored into compatibility.h, e.g.:

#if PG_VERSION_NUM < 130000
#define TYPALIGN_INT  'i'
#define TYPSTORAGE_PLAIN 'p'
#define TYPSTORAGE_EXTENDED 'x'
#endif

This would remove the need for the macro #ifs here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Fixed.

src/datatype.c Outdated
* types) in ArrayType and in composite types in DatumTupleFields. This
* oid must be preserved by binary upgrades.
*/
#if PG_VERSION_NUM >= 140000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly -- is there a way to put this into compatibility.h or the like, so there is only one call needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an elegant way to put this into compatibility.h. TypeCreate(...) is used twice, one for base type creation, one for array type creation, a few arguments are different between the two calls. Also, TypeCreate requires many arguments which is already error-prone ... Let me know if you have a better idea on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I provided an example with a different function in this review. See if that helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@jkatz jkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to ensure we have a check that the user-provided function is immutable.

src/datatype.c Show resolved Hide resolved
src/datatype.c Show resolved Hide resolved
src/datatype.c Outdated
Comment on lines 260 to 268
#if PG_VERSION_NUM >= 120000
typeOid = GetSysCacheOid2(TYPENAMENSP, Anum_pg_type_oid,
CStringGetDatum(typeName),
ObjectIdGetDatum(typeNamespace));
#else
typeOid = GetSysCacheOid2(TYPENAMENSP,
CStringGetDatum(typeName),
ObjectIdGetDatum(typeNamespace));
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be easier to read as a macro in compatibility.h:

#if PG_VERSION >= 120000
#define GETSYSCACHEOID2(a,b,c)  GetSysCacheOid2(a, Anum_pg_type_oid, b, c);
#else
#define GETSYSCACHEOID2(a,b,c)  GetSysCacheOid2(a, b, c);
#endif

Then the code can become:

Suggested change
#if PG_VERSION_NUM >= 120000
typeOid = GetSysCacheOid2(TYPENAMENSP, Anum_pg_type_oid,
CStringGetDatum(typeName),
ObjectIdGetDatum(typeNamespace));
#else
typeOid = GetSysCacheOid2(TYPENAMENSP,
CStringGetDatum(typeName),
ObjectIdGetDatum(typeNamespace));
#endif
typeOid = GETSYSCACHEOID2(TYPENAMENSP,
CStringGetDatum(typeName),
ObjectIdGetDatum(typeNamespace));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the suggestion.

src/datatype.c Outdated
Comment on lines 278 to 284
else if (if_not_exists)
{
ereport(NOTICE,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("type \"%s\" already exists, skipping", typeName)));
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the if_not_exists is supposed to fail quietly, the ereport won't do that. It will abort the transaction. You just want to return false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTICE shouldn't fail right?

Maybe we should just LOG it instead of NOTICE?

Copy link
Contributor Author

@lyupan lyupan Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this won't abort the transaction, as the report level is NOTICE, similarly https://github.com/postgres/postgres/blob/master/src/backend/commands/createas.c#L421

src/datatype.c Outdated
Comment on lines 276 to 288
if (moveArrayTypeName(typeOid, typeName, typeNamespace))
typeOid = InvalidOid;
else if (if_not_exists)
{
ereport(NOTICE,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("type \"%s\" already exists, skipping", typeName)));
return false;
}
else
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("type \"%s\" already exists", typeName)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very if/else heavy. Here's an alternative approach:

Suggested change
if (moveArrayTypeName(typeOid, typeName, typeNamespace))
typeOid = InvalidOid;
else if (if_not_exists)
{
ereport(NOTICE,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("type \"%s\" already exists, skipping", typeName)));
return false;
}
else
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("type \"%s\" already exists", typeName)));
if (!moveArrayTypeName(typeOid, typeName, typeNamespace))
{
if (if_not_exists)
return false;
ereport(NOTICE,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("type \"%s\" already exists", typeName)));
}
typeOid = InvalidOid;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the suggestion.

src/datatype.c Outdated
* types) in ArrayType and in composite types in DatumTupleFields. This
* oid must be preserved by binary upgrades.
*/
#if PG_VERSION_NUM >= 140000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I provided an example with a different function in this review. See if that helps.

src/datatype.c Outdated
/*
* Call the user-defined output function.
*/
PG_RETURN_CSTRING(TextDatumGetCString(OidFunctionCall1Coll(output_function, InvalidOid, datum)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd break up this line, in case there are errors on the OidFunctionCall1Coll function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/datatype.c Outdated
false, /* security */
false, /* Leak Proof */
false, /* Strict */
PROVOLATILE_IMMUTABLE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we validating that the user is providing an immutable function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think we actually should define this attribute based on the user input function; I've changed attributes prosecdef, proleakproof, proisstrict, provolatile, proparallel, procost, prorows to match the actual user input function.

src/datatype.c Outdated
false, /* Leak Proof */
false, /* Strict */
PROVOLATILE_IMMUTABLE,
PROPARALLEL_SAFE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, are we validating that the user is providing a parallel safe function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment.

src/datatype.c Outdated
List *inputNames = NIL;

tuple = SearchSysCache1(PROCOID,
ObjectIdGetDatum(funcid));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something seems off with the indentation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@olirice
Copy link

olirice commented Jun 26, 2023

nice!
Would be great to see a page for this in pgtle/docs with an example + comparing the pgtle functions with equivalent SQL commands. Its clear what's going on from the test suite, but that'll be a little harder for users to track down once merged

Also, added a max size check for type creation.
@lyupan
Copy link
Contributor Author

lyupan commented Jun 26, 2023

nice! Would be great to see a page for this in pgtle/docs with an example + comparing the pgtle functions with equivalent SQL commands. Its clear what's going on from the test suite, but that'll be a little harder for users to track down once merged

Hey Oliver, thanks for the suggestion! The doc is not there yet because the feature is still under development (there will be a few followup changes); once the code changes are mostly completed, we'll add a doc page with examples.

Copy link
Contributor

@jkatz jkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think this is pretty close now. My major comment is around turning the pgtle.create_base_type_if_not_exists function into a PL/pgSQL function (I provided a reference to a similar one in the comments). This allows us to remove some more C code, and simplifies some of the implementation details in the other functions.

Outside of that, I think this is good to go.

Comment on lines 56 to 68
CREATE FUNCTION pgtle.create_base_type_if_not_exists
(
typenamespace regnamespace,
typename name,
infunc regprocedure,
outfunc regprocedure,
internallength int4
)
RETURNS boolean
SET search_path TO 'pgtle'
STRICT
AS 'MODULE_PATHNAME', 'pg_tle_create_base_type_if_not_exists'
LANGUAGE C;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this couldn't be a PL/pgSQL function similar to some of the others (e.g. https://github.com/aws/pg_tle/blob/main/pg_tle--1.0.0.sql#L389). I don't think there's anything C-specific that we need in this call, and we can push all the hard work down to create_base_type. This should reduce our surface area for C-specific bugs.

Copy link
Contributor Author

@lyupan lyupan Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest commit, only create_base_type_if_not_exists is now defined in C, create_base_type is defined in plpgsql and will raise an exception if create_base_type_if_not_exists returns false.

The reason is that there are multiple cases where we could return ERRCODE_DUPLICATE_OBJECT, for example, if the shell type doesn't exist when create_baset_type is called (same error as postgres create type would return).

src/datatype.c Outdated
Comment on lines 186 to 188
PG_FUNCTION_INFO_V1(pg_tle_create_base_type_if_not_exists);
Datum
pg_tle_create_base_type_if_not_exists(PG_FUNCTION_ARGS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, I think we can just move this into a PL/pgSQL function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See a previous comment

src/datatype.c Show resolved Hide resolved
src/datatype.c Outdated
Comment on lines 272 to 278
if (if_not_exists)
{
ereport(NOTICE,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("type \"%s\" already exists, skipping", typeName)));
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move the ...if_not_exists to a PL/pgSQL function, we can just return the ERRCODE_DUPLICATE_OBJECT and not need this section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see a previous comment

src/datatype.c Outdated
* if_not_exists: if true, don't fail on duplicate name, just print a notice and return false.
* Otherwise, fail on duplicate name.
*/
static bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move the if_not_exists to a PL/pgSQL function, this can become a static void

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see a previous comment

src/datatype.c Show resolved Hide resolved
src/datatype.c Show resolved Hide resolved
src/datatype.c Outdated
* Raise an error if any requirement is not met.
*/
static void
check_user_input_func(Oid funcid, Oid expectedNamespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the input/output checks are almost identical, subject to a few wording changes. Can we either make the checks a common function, or refactor the code so we only have the check logic once?

For the error message wording, we can say "type function cannot ..." instead of having input/output in it. Or, have a flag that sets the appropriate language of "input" or "output"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they are mostly the same, updated to check_user_defined_func instead.

src/datatype.c Outdated
/*
* Make effects of commands visible
*/
CommandCounterIncrement();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we need this? Is the type not visible in the transaction otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, we don't need this actually https://github.com/postgres/postgres/blob/master/src/backend/access/transam/README#L44. It is already called automatically within a transaction.

We only need it if we want to see the effect within the same command (function), for example, if we want to use one C function to do both create_shell_type and create_base_type.

#endif

/*
* If it's not a shell, see if it's an autogenerated array type, and if so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there somewhere I can read about this autogenerated array type / how that's triggered?
I think the tricky part is I imagine you dug deep into how pg_type.c for this section but it's not clear, esp for those without context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/datatype.c Outdated
Comment on lines 278 to 284
else if (if_not_exists)
{
ereport(NOTICE,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("type \"%s\" already exists, skipping", typeName)));
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTICE shouldn't fail right?

Maybe we should just LOG it instead of NOTICE?

src/datatype.c Outdated
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
errmsg("invalid type internal length %d, maximum size is %d", internalLength, TLE_BASE_TYPE_SIZE_LIMIT)));

/* As a bytea is of variable length, we need to allow for the header */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments on why/how/where you're referencing this from as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we store everything as bytea which has a header, that's why we add it here.

For example, if a client wants to create a fixed length custom type int2: a 2-element vector of one byte integer value; in theory we only need two bytes (if we use CREATE TYPE to create it), however, because in the pg_tle implementation, it's stored using bytea, we need to account for the extra header.

…ype_if_not_exists implementation in C.

Also, refactor a few type I/O functions into find/check_user_defined_func.
Comment on lines 65 to 75
SET search_path TO 'pgtle'
AS $_pgtleie_$
DECLARE
not_exists boolean;
BEGIN
not_exists = pgtle.create_base_type_if_not_exists(typenamespace, typename, infunc, outfunc, internallength);
IF NOT not_exists THEN
RAISE EXCEPTION 'type "%" already exists', typename;
END IF;
END;
$_pgtleie_$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested the opposite -- the C function raises exceptions -- it does not return true/false. Then pgtle.create_base_type is a C function.

Then pgtle.create_base_type_if_not_exists becomes:

CREATE FUNCTION pgtle.create_base_type_if_not_exists(
  typenamespace regnamespace,
  typename name,
  infunc regprocedure,
  outfunc regprocedure,
  internallength int4
)
RETURNS bool
LANGUAGE plpgsql
AS $$
BEGIN
  PERFORM pgtle.create_base_type( typenamespace, typename, infunc, outfunc, internallength);
  RETURN TRUE;
EXCEPTION
  -- only catch the unique violation. let all other exceptions pass through.
  WHEN unique_violation THEN
    RETURN FALSE;
END;
$$;

Copy link
Contributor Author

@lyupan lyupan Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't choose this approach is that there are multiple cases where create_base_type could return ERRCODE_DUPLICATE_OBJECT.

For example, if create_base_type is called without creating shell type first, ERRCODE_DUPLICATE_OBJECT is raised (same error in postgres create base type without shell type); then create_base_type_if_not_exists would catch this error and return false (it would appear to the user as if the type already exists).
If we want to define create_base_type in C, we'll need to change ERRCODE_DUPLICATE_OBJECT to other error code in this place (deviates from postgres CREATE TYPE code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the latest commit

@lyupan lyupan requested a review from jkatz June 29, 2023 22:05
@jkatz jkatz merged commit a0ceb1f into aws:main Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants