-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Transport to use API from Trino v406 #128
Upgrade Transport to use API from Trino v406 #128
Conversation
…/linkedin/transport into yiqiangin/transport-trino-plugin
…/linkedin/transport into yiqiangin/transport-trino-plugin
…/linkedin/transport into yiqiangin/transport-trino-plugin
…/linkedin/transport into yiqiangin/transport-trino-plugin
…/linkedin/transport into yiqiangin/transport-trino-plugin
…/linkedin/transport into yiqiangin/transport-trino-plugin
…/linkedin/transport into yiqiangin/transport-trino-plugin
…/linkedin/transport into yiqiangin/transport-trino-plugin
…/linkedin/transport into yiqiangin/transport-trino-plugin
…/linkedin/transport into yiqiangin/transport-trino-plugin
…/linkedin/transport into yiqiangin/transport-trino-plugin
Thanks for working on this! |
…/linkedin/transport into yiqiangin/transport-trino-plugin
…/linkedin/transport into yiqiangin/transport-trino-plugin
...-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportConnectorFactory.java
Outdated
Show resolved
Hide resolved
...-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportConnectorFactory.java
Outdated
Show resolved
Hide resolved
...-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportConnectorFactory.java
Outdated
Show resolved
Hide resolved
...-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportConnectorFactory.java
Outdated
Show resolved
Hide resolved
...-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportConnectorFactory.java
Outdated
Show resolved
Hide resolved
could you also add in the testing section that |
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yiqiangin for this PR!
// Temporarily disable the tests for Trino. As the test infrastructure from Trino named QueryAssertions is used to | ||
// run these test for Trino, QueryAssertions mandatory execute the function with the query in two formats: one with | ||
// is the normal query (e.g. SELECT "binary_duplicate"(a0) FROM (VALUES ROW(from_base64('YmFy'))) t(a0);), the other | ||
// is with "where RAND()>0" clause (e.g. SELECT "binary_duplicate"(a0) FROM (VALUES ROW(from_base64('YmFy'))) t(a0) where RAND()>0;) | ||
// QueryAssertions verifies the output from both queries are equal otherwise the test fail. | ||
// However, the execution of the query with where clause triggers the code of VariableWidthBlockBuilder.writeByte() to create | ||
// the input byte array in Slice with an initial 32 byes capacity, while the execution of the query without where clause does not trigger | ||
// the code of VariableWidthBlockBuilder.writeByte() and create the input byte array in Slice with the actual capacity of the content. | ||
// Therefore, the outputs from both queries are different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the next step for it? Sounds like it's an issue of Trino QueryAssertions
? Maybe add a TODO item here and other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 disabling the tests does not seem like the right course of action here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to run the failed test with QueryAssertion
now? It was not yet clear why QueryAssertion
failed previously with the old way using SqlScalarFunction
. Annotation-driven functions had no issue with QueryAssertion
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test still fails with QueryAssertions
. Briefly QueryAssertions
requires to compare the outputs from two queries with the same UDF, however in case of binary UDF, two queries results in the execution of different code paths in Trino which makes the inputs to UDF are different, so the outputs of the queries are different. The details are added in the comments of the code.
I have created a TODO issue to reenable them after the root cause is found in Trino and fixed as shown in #131
...-example-udfs/src/test/java/com/linkedin/transport/examples/TestBinaryDuplicateFunction.java
Outdated
Show resolved
Hide resolved
...e-udfs/src/test/java/com/linkedin/transport/examples/TestNestedMapFromTwoArraysFunction.java
Outdated
Show resolved
Hide resolved
.../transportable-udfs-test-spi/src/main/java/com/linkedin/transport/test/spi/SqlStdTester.java
Outdated
Show resolved
Hide resolved
...-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportConnectorFactory.java
Outdated
Show resolved
Hide resolved
...-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportConnectorFactory.java
Outdated
Show resolved
Hide resolved
...le-udfs-trino-plugin/src/test/java/com/linkedin/transport/trino/plugin/TestTransportUdf.java
Outdated
Show resolved
Hide resolved
...le-udfs-trino-plugin/src/test/java/com/linkedin/transport/trino/plugin/TestTransportUdf.java
Outdated
Show resolved
Hide resolved
...le-udfs-trino-plugin/src/test/java/com/linkedin/transport/trino/plugin/TestTransportUdf.java
Outdated
Show resolved
Hide resolved
@Test | ||
public void testFileLookupFailNull() { | ||
StdTester tester = getTester(); | ||
tester.check(functionCall("file_lookup", resource("file_lookup_function/sample"), null), null, "boolean"); | ||
try { | ||
StdTester tester = getTester(); | ||
tester.check(functionCall("file_lookup", resource("file_lookup_function/sample"), null), null, "boolean"); | ||
} catch (NullPointerException ex) { | ||
Assert.assertFalse(Boolean.valueOf(System.getProperty("trinoTest"))); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change. Previously we asserted that NPE was thrown. Now you seem to be asserting that NPE is thrown only when trinoTest
is false? But when trinoTest
is true we do expect the exception to be thrown, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code of this test are executed for Trino, Hive and Spark. Originally it expects a NullPointerException is thrown during the execution of looking up a null value in a file by all three query engines. However, Trino v406 does not throw NullPointerException but returns a null value in this case. Therefore the code change removes the expected exception from @test annotation, and only checks if a NullPointerException is caught if it is not in case of Trino.
Also I have already added some comments to explain the reason in the code.
@@ -12,6 +12,7 @@ dependencies { | |||
implementation('com.google.guava:guava:24.1-jre') | |||
implementation('org.apache.commons:commons-io:1.3.2') | |||
testImplementation('io.airlift:aircompressor:0.21') | |||
testImplementation('org.junit.jupiter:junit-jupiter-api:5.9.2') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this module use TestNG? why is JUnit 5 added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this module use TestNG. However, as the tests for Trino use the class of QueryAssertions
from Trino, if this dependency is not added, the tests for Trino will fail because of the following error:
java.lang.NoClassDefFoundError: org/junit/jupiter/api/Assertions
at io.trino.sql.query.QueryAssertions$ExpressionAssertProvider.evaluate(QueryAssertions.java:675)
at io.trino.sql.query.QueryAssertions$ExpressionAssertProvider.assertThat(QueryAssertions.java:691)
at io.trino.sql.query.QueryAssertions$ExpressionAssertProvider.assertThat(QueryAssertions.java:602)
...le-udf-metadata/trino/resources/META-INF/services/com.linkedin.transport.trino.StdUdfWrapper
Outdated
Show resolved
Hide resolved
|
||
public StdUdfWrapper(StdUDF stdUDF) { | ||
this.functionMetadata = FunctionMetadata.builder(FunctionKind.SCALAR) | ||
.nullable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why always nullable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this code is simply migrating the existing code with the APIs of FunctionMetadata from Trino v352 to Trino v406 as shown in https://github.com/linkedin/transport/blob/master/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L74 , the original code has already set nullable as true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiqiangin The link you pasted is using StdUDF#getNullableArguments()
to decide which arguments can receive a null value; it'd be better to mention that it is moved here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weijiii Here is the definition of FunctionMetadata in v352
public FunctionMetadata(
Signature signature,
boolean nullable,
List<FunctionArgumentDefinition> argumentDefinitions,
boolean hidden,
boolean deterministic,
String description,
FunctionKind kind)
{
this(FunctionId.toFunctionId(signature), signature, nullable, argumentDefinitions, hidden, deterministic, description, kind, false);
}
As shown in this line https://github.com/linkedin/transport/blob/master/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L82
true
is set as the second argument in new FunctionMetadata()
which is nullable
, Booleans.asList(stdUDF.getNullableArguments()
is the third argument
private final ClassLoader parent; | ||
|
||
public TransportUDFClassLoader(ClassLoader parent, List<URL> urls) { | ||
super(urls.toArray(new URL[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the modern way to do this is:
super(urls.toArray(new URL[0])); | |
super(urls.toArray(URL[]::new)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Transport is still using Java 8, this modern approach seems not allowed.
...le-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportUDFClassLoader.java
Outdated
Show resolved
Hide resolved
...le-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportUDFClassLoader.java
Show resolved
Hide resolved
...le-udfs-trino-plugin/src/test/java/com/linkedin/transport/trino/plugin/TestTransportUdf.java
Outdated
Show resolved
Hide resolved
...-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportConnectorFactory.java
Outdated
Show resolved
Hide resolved
...ansportable-udfs-test-trino/src/main/java/com/linkedin/transport/test/trino/TrinoTester.java
Show resolved
Hide resolved
...ansportable-udfs-test-trino/src/main/java/com/linkedin/transport/test/trino/TrinoTester.java
Outdated
Show resolved
Hide resolved
Can you also update StdUdfUtils#RESERVED_KEYWORDS ? This is within scope of upgrading to Trino 406. The new ones should be
I think we should expedite this release so the subsequent testing of our production UDFs can be initiated. (cc Dali reviewers @ljfgem @aastha25 ) Thanks. |
I have updated the list per Erik's review before. Just double check the list in the code is the same as the list shown here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yiqiangin for this PR!
@wmoustafa @xkrogen: Do you want to take a look before merging?
...ortable-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportConnector.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have 4 JAR files checked in? (e.g. transport-udf-1-trino-dist-thin.jar
)
// in case of Trino, the execution of a query with UDF to check a null value in a file | ||
// does not result in a NullPointerException, but returns a null value | ||
tester.check(functionCall("file_lookup", resource("file_lookup_function/sample"), null), null, "boolean"); | ||
} catch (NullPointerException ex) { | ||
Assert.assertFalse(Boolean.valueOf(System.getProperty("trinoTest"))); | ||
// in case of Hive and Spark, the execution of a query with UDF to check a null value in a file results in a NullPointerException | ||
Assert.assertFalse(isTrinoTest()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this difference in behavior expected? do we need to modify the behavior to be consistent across engines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trino v352 has the same behavior with other engines, however, Trino v406 does not have.
...ansportable-udfs-test-trino/src/main/java/com/linkedin/transport/test/trino/TrinoTester.java
Show resolved
Hide resolved
...ortable-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportConnector.java
Show resolved
Hide resolved
public class TransportModule implements Module { | ||
@Override | ||
public void configure(Binder binder) { | ||
binder.bind(TransportConnector.class).in(Scopes.SINGLETON); | ||
configBinder(binder).bindConfig(TransportConfig.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we extend AbstractModule
as recommended in the Javadoc of Module
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I follows the example code of plugin from Trino like https://github.com/trinodb/trino/blob/master/plugin/trino-example-http/src/main/java/io/trino/plugin/example/ExampleModule.java#L25 and https://trino.io/docs/current/develop/connectors.html
...ortable-udfs-trino-plugin/src/main/java/com/linkedin/transport/trino/TransportConnector.java
Show resolved
Hide resolved
@Test | ||
public void testTransportPluginInitialization() { | ||
TestingTrinoServer server = TestingTrinoServer.create(); | ||
Plugin plugin = new TransportPlugin(); | ||
server.installPlugin(plugin); | ||
server.createCatalog("LINKEDIN", "TRANSPORT"); | ||
Assert.assertTrue(getOnlyElement(plugin.getConnectorFactories()) instanceof TransportConnectorFactory); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we assert that the UDFs are actually able to be loaded?
it would be nice also to have a test which can assert that each UDF is in a separate classloader -- maybe load two UDFs and grab the classloader for each and assert that they are different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 @yiqiangin
LocalQueryRunner _runner = ...;
_runner.installPlugin(new TransportPlugin());
_runner.createCatalog("LINKEDIN", "TRANSPORT", ImmutableMap.of(<some config>...));
MaterializedResult result = _runner("SELECT array_element_at(array[1,2,3], 2)");
// assert result
Then we can run some query to verify the functions exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I change this test to initialize TransportPlugin
with a LocalQueryRunner. So the success of loading of UDFs are verified by successfully executing a query with UDF via LocalQueryRunner as shown in #133
For the test which can assert that each UDF is in a separate classloader, I am still trying to find out a good way to implement this in Transport side.
Those JARs contains UDF classes which are actually loaded in |
Can we put them inside of a |
It has been done in #133 |
* Upgrade Gradle from 6.7 to 7.5 (#122) * Upgrade to Gradle 7.5 * Upgrade to Gradle 7.5 * remove a unnecessary classpath * upgrade gradle wrapper in transportable-udfs-examples * address comments * add comment * using api() to avoid adding dependencies * Simplify dependencies further --------- Co-authored-by: Yiqiang Ding <yiqding@linkedin.com> Co-authored-by: Jiefan Li <jiefli@linkedin.com> * Upgrade Transport to compile Trino UDF with Java 17 (#125) Co-authored-by: Yiqiang Ding <yiqding@linkedin.com> * Upgrade Transport to use API from Trino v406 (#128) * Upgrade Transport to use API from Trino v406 * Upgrade Transport to use API from Trino v406 * PoC of Trino Connector * PoC of Trino Connector * address comments and remove unreachable artifactory repo * address comments * fix the issue of Maven repo * fixing conjar repo issue in transportable-udfs-examples * address comments * address comments * address comments * address comments * address comments * address comments * address comments * add some comments * remove conjar repo * address comments --------- Co-authored-by: Yiqiang Ding <yiqding@linkedin.com> * Improvements on transport-trino-plugin (#133) * Improvments on transportable-udfs-trino-plugin * Improvments on transportable-udfs-trino-plugin * fix a type * address comments * address comments --------- Co-authored-by: Yiqiang Ding <yiqding@linkedin.com> --------- Co-authored-by: Yiqiang Ding <yiqding@linkedin.com> Co-authored-by: Jiefan Li <jiefli@linkedin.com>
There are three major parts in code changes:
StdUdfWrapper
andTrinoFactory
The main change is the class of
StdUdfWrapper
does not extend the class ofSqlScalarFuntion
from trino-main any more.transportable-udfs-trino-plugin
which implements the deployment of Transport UDFs via a Trino plugin namedTransportPlugin
following the new Transport UDF deployment approach as shownin https://docs.google.com/document/d/1MgruaEc6Flwfr6KnDT0gGI_vCsCCNe_y3vXlGV-1dNQ/edit
The major classes including:
TransportPlugin
,TransportModule
,TransportConnectorFactory
andTransportConnector
for a plugin to to loaded byPluginManager
in Trino server to build a catalog with a connector to provide Transport UDFsTransportConnectorMetadata
,TransportFunctionProvider
andTransportUDFClassLoader
to dynamically load the implementation of UDF classes and build a map for function resolution during the initialization of the pluginTransportPlugin
,TransportConnectorFactory
andTransportConnector
to load the target UDF class for testtransportable-udfs-trino
andtransportable-udfs-test-trino
transportable-udfs-example-udfs
transportable-udfs-example-udfs
which are deployed in a local Trino server