Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement create_operator_func API. #210

Merged
merged 6 commits into from
Jul 12, 2023
Merged

Implement create_operator_func API. #210

merged 6 commits into from
Jul 12, 2023

Conversation

lyupan
Copy link
Contributor

@lyupan lyupan commented Jul 3, 2023

pgtle_admin can run create_operator_func to create operator funcs for base types created in pg_tle (by create_base_type).

User-defined operator function must accept one or two arguments of BYTEA, we make an C-version operator function that accepts base type as the argument; inside the C-version operator function, we find the user-defined operator function and execute it.

Notes:

  1. create_operator_func is not strictly necessary to define an operator function on the base type; however, it will be required for languages such as plrust, because the custom base type is not available in plrust;
  2. CREATE OPERATOR CLASS requires superuser privileges in pg_tle, but should be already supported in AWS RDS.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

pgtle_admin can run create_operator_func to create operator funcs for base types created in pg_tle (by create_base_type).

User-defined operator function must accept one or two arguments of BYTEA, we make an C-version operator function that accepts base type as the argument; inside the C-version operator function, we find the user-defined operator function and execute it.

Notes:
1. create_operator_func is not strictly necessary to define an operator function on the base type; however, it will be required for languages such as plrust, because the custom base type is not available in plrust;
2. `CREATE OPERATOR CLASS` requires superuser privileges in pg_tle, but should be already supported in AWS RDS.

Co-authored-by: Jim Mlodgenski <jimmy76@gmail.com>
Copy link
Contributor

@jkatz jkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments to get started with. Need to dive deeper into this.

src/datatype.c Outdated
Comment on lines 821 to 841
if (!((proc->pronargs == 1 && proc->proargtypes.values[0] == BYTEAOID) ||
(proc->pronargs == 2 && proc->proargtypes.values[0] == BYTEAOID && proc->proargtypes.values[1] == BYTEAOID)))
{
ReleaseSysCache(tuple);
ereport(ERROR,
(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
errmsg("type opeartor function must accept one or two arguments of type bytea")));
}

if (proc->pronamespace != expectedNamespace)
{
ReleaseSysCache(tuple);
ereport(ERROR,
(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
errmsg("type operator functions must exist in the same namespace as the type")));
}

argTypes = (Oid *) palloc(proc->pronargs * sizeof(Oid));
argTypes[0] = typeOid;
if (proc->pronargs == 2)
argTypes[1] = typeOid;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!((proc->pronargs == 1 && proc->proargtypes.values[0] == BYTEAOID) ||
(proc->pronargs == 2 && proc->proargtypes.values[0] == BYTEAOID && proc->proargtypes.values[1] == BYTEAOID)))
{
ReleaseSysCache(tuple);
ereport(ERROR,
(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
errmsg("type opeartor function must accept one or two arguments of type bytea")));
}
if (proc->pronamespace != expectedNamespace)
{
ReleaseSysCache(tuple);
ereport(ERROR,
(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
errmsg("type operator functions must exist in the same namespace as the type")));
}
argTypes = (Oid *) palloc(proc->pronargs * sizeof(Oid));
argTypes[0] = typeOid;
if (proc->pronargs == 2)
argTypes[1] = typeOid;
if (proc->pronamespace != expectedNamespace)
{
ReleaseSysCache(tuple);
ereport(ERROR,
(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
errmsg("type operator functions must exist in the same namespace as the type")));
}
if (proc->pronargs > 2)
{
ReleaseSysCache(tuple);
ereport(ERROR,
(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
errmsg("type opeartor function must accept one or two arguments bytea")));
}
argTypes = (Oid *) palloc(proc->pronargs * sizeof(Oid));
/* FIXME: declare int i at the top of the function */
for (i = 0; i < proc->pronargs; i++)
{
if (proc->proargtypes.values[i] != BYTEAOID)
{
ReleaseSysCache(tuple);
ereport(ERROR,
(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
errmsg("type operator function must accept arguments of bytea")));
}
argTypes[i] = typeOid;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks for the suggestion!

src/datatype.c Outdated
Comment on lines 845 to 848
if (OidIsValid(LookupFuncName(funcNameList, proc->pronargs, argTypes, true)))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
errmsg("function \"%s\" already exists", NameListToString(funcNameList))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't release the tuple.

Also -- are we able to release the tuple after we copy the data into the structure? I believe we'd have to do a full copy of the tuples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't release the tuple

Good catch! Fixed in a new commit.

Also -- are we able to release the tuple after we copy the data into the structure? I believe we'd have to do a full copy of the tuples.

Do you refer to this code makeString(NameStr(proc->proname))? That's probably an issue, I've added a pstrdup to do a string copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyupan I was referring to seeing if we could make a copy of the entire struct so we could immediately release the tuple, or (as you suggest), copy the values we need and release it. I don't think it's a long running operation, but I'd generally think it'd be better to not have to access the tuple longer than we need to.

Copy link
Contributor

@jkatz jkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just about ready. Most of my comments were minor.

src/datatype.c Outdated
Comment on lines 705 to 706
inputNames = list_make2(makeString(get_namespace_name(proc->pronamespace)),
makeString(NameStr(proc->proname)));
makeString(pstrdup(NameStr(proc->proname))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty hard to read. Likewise, if we need to make a copy of proc->proname, we'd also have to do that for proc->pronamespace (unless get_namespace_name is doing that).

src/datatype.c Outdated
Comment on lines 778 to 780
argTypes[0] = typeOid;
if (nargs == 2)
argTypes[1] = typeOid;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
argTypes[0] = typeOid;
if (nargs == 2)
argTypes[1] = typeOid;
for (i = 0; i < nargs; i++)
argTypes[i] = typeOid;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the suggestion

src/datatype.c Outdated

CHECK_CAN_SET_ROLE(typeOwner, tleadminoid);

if (!is_pgtle_io_func(inputOid, true) || !is_pgtle_io_func(outputOid, false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!is_pgtle_io_func(inputOid, true) || !is_pgtle_io_func(outputOid, false))
if (!(is_pgtle_io_func(inputOid, true) && is_pgtle_io_func(outputOid, false)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/datatype.c Outdated

prosrcstring = TextDatumGetCString(prosrcattr);
ReleaseSysCache(tuple);
return strcmp(prosrcstring, typeInput ? TLE_BASE_TYPE_IN : TLE_BASE_TYPE_OUT) == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using strncmp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/datatype.c Outdated
}

nargs = proc->pronargs;
if (nargs != 1 && nargs != 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (nargs != 1 && nargs != 2)
if (nargs < 1 || nargs > 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/datatype.c Outdated

procname = get_qualified_funcname(fcinfo->flinfo->fn_oid);
get_func_signature(fcinfo->flinfo->fn_oid, &argtypes, &nargs);
if (nargs != 1 && nargs != 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (nargs != 1 && nargs != 2)
if (nargs < 1 || nargs > 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/datatype.c Outdated
Comment on lines 969 to 971
argtypes[0] = BYTEAOID;
if (nargs == 2)
argtypes[1] = BYTEAOID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
argtypes[0] = BYTEAOID;
if (nargs == 2)
argtypes[1] = BYTEAOID;
for (i = 0; i < 2; i++)
argtypes[i] = BYTEAOID;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/datatype.c Outdated
Comment on lines 980 to 985
if (nargs == 1)
result = OidFunctionCall1Coll(userFunc, InvalidOid, PG_GETARG_DATUM(0));
else
result = OidFunctionCall2Coll(userFunc, InvalidOid, PG_GETARG_DATUM(0), PG_GETARG_DATUM(1));

return result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (nargs == 1)
result = OidFunctionCall1Coll(userFunc, InvalidOid, PG_GETARG_DATUM(0));
else
result = OidFunctionCall2Coll(userFunc, InvalidOid, PG_GETARG_DATUM(0), PG_GETARG_DATUM(1));
return result;
if (nargs == 1)
return OidFunctionCall1Coll(userFunc, InvalidOid, PG_GETARG_DATUM(0));
return OidFunctionCall2Coll(userFunc, InvalidOid, PG_GETARG_DATUM(0), PG_GETARG_DATUM(1));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* pgtle_admin can run this function, let's check and make sure there is
* not a way to bypass that
*/
check_is_pgtle_admin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some regression tests for the various checks we exercise in this function? Such as

  • a non pgtle_admin role attempts to call the function -- fails
  • role without CREATE permissions in namespace attempts to call the function -- fails
  • role is not owner of base type -- fails
  • role is not owner of operator func -- fails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added a few more regression tests cases.

if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed for function %u", funcid);
proc = (Form_pg_proc) GETSTRUCT(tuple);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the ReleaseSysCache(tuple) call way up here, and call it once only, after copying all the values we need in the subsequent code? My understanding is that right now, we only have ref copies so we can't release until we're done, but there are a fixed number of values we reference later so we could make copies of all those upfront.

The reason for this suggestion is to make it easier to read and reason about it. The way its structured now, there are many code paths to check to ensure we've released the lock, so it would be a little easier to read and maintain in the future if we change the code later, to have one point of releasing the lock.

It can also reduce the period of time PROCOID is locked for the happy path, although I do recognize that for the error cases instead of failing early like now, we will do more work before release because of all the copying we will never need.

Copy link
Contributor Author

@lyupan lyupan Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Reduced a few ReleaseSysCache calls by copying the values upfront. Because we want check nargs before allocation a new array (to avoid too many arguments or zero/negative nargs values), still do two ReleaseSysCache calls here.

proc = (Form_pg_proc) GETSTRUCT(tuple);
if (proc->prolang != ClanguageId)
{
ReleaseSysCache(tuple);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as previous one, about copying the data out, and once only ReleaseSysCache(tuple)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SysCacheGetAttr is somewhat more costly than a simply copy, and there are only two ReleaseSysCache, I guess we can keep it as it is.

Copy link
Contributor

@formalLogicGirl formalLogicGirl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@formalLogicGirl formalLogicGirl merged commit 101d19b into aws:main Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants