-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade kudu client to 1.15.0 #10940
Upgrade kudu client to 1.15.0 #10940
Conversation
Want to let the CI process run so there is a build showing the timeouts causing query failure. After that, will push another commit to actually resolve the timeout problems by setting |
Also if we want to split the kudu client upgrade from the background flushing problem let me know. There just is not a way to [easily] confirm the background flushing problem with kudu client 1.10.0 since the timeouts never/rarely happen. |
a67b228
to
7f3eb5e
Compare
de3d4de
to
d32db3f
Compare
@@ -103,7 +106,7 @@ private KuduPageSink( | |||
|
|||
this.table = table; | |||
this.session = clientSession.newSession(); | |||
this.session.setFlushMode(SessionConfiguration.FlushMode.AUTO_FLUSH_BACKGROUND); | |||
this.session.setFlushMode(SessionConfiguration.FlushMode.AUTO_FLUSH_SYNC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this introduced in the latest version ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this config also existed in 1.10.0.
See the PR description, the only way I could reproduce this bug (#5687) was by upgrading the kudu client to 1.15.0 which triggered timeouts to kudu.
That is why I fixed this bug in this upgrade client PR.
@@ -125,7 +128,10 @@ private KuduPageSink( | |||
} | |||
|
|||
try { | |||
session.apply(upsert); | |||
OperationResponse response = session.apply(upsert); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract it from the version update ? It looks like OperationResponse
is returned in old APIs too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, there is no way to easily verify the fix because you need a very specific sequence of timeouts [no timeouts when initially connecting to kudu, but timeouts during deletes/upserts/etc]. The kudu 1.15.0 client [accidentally] provides those timeouts due to a bug in the client.
See here for the failing kudu tests when upgrading to kudu 1.15.0:
https://github.com/trinodb/trino/runs/5057440792?check_suite_focus=true
And here for the workaround that makes tests pass:
https://github.com/trinodb/trino/runs/5060298096?check_suite_focus=true
If we do not mind having no tests around the change, I can extract this into a separate PR.
plugin/trino-kudu/src/main/java/io/trino/plugin/kudu/KuduRecordCursor.java
Outdated
Show resolved
Hide resolved
plugin/trino-kudu/src/test/java/io/trino/plugin/kudu/TestingKuduServer.java
Outdated
Show resolved
Hide resolved
d32db3f
to
9a02003
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe all needs to be squashed except the last commit.
LGTM
5d09fa3
to
5e2a90c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % doc update.
Please update "Requirements" in Kudu docs to specify minimum supported version as 1.15 now. It might work with older versions but we don't test it now - if we intend to claim otherwise then add a test extending BaseConnectorSmokeTest with an older Kudu version.
5e2a90c
to
6e51247
Compare
Kudu 1.10.0 is pretty old (>2years, released on November 1, 2019) so I just updated the docs to say we only support 1.15.0 or higher |
That's fair. 1.13 is oldest supported release anyway according to https://kudu.apache.org/releases. |
If we care about supporting 1.13 lmk, I can add an additional set of smoke tests in. |
6e51247
to
07dff60
Compare
Would be nice to do if it's not too much work otherwise we can tackle that separately. Please remember to update docs accordingly. |
07dff60
to
5d749ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % comments.
I believe #10953 depends on this PR getting merged first?
plugin/trino-kudu/src/test/java/io/trino/plugin/kudu/TestKudu115SmokeTests.java
Outdated
Show resolved
Hide resolved
plugin/trino-kudu/src/test/java/io/trino/plugin/kudu/TestKudu113SmokeTests.java
Outdated
Show resolved
Hide resolved
plugin/trino-kudu/src/test/java/io/trino/plugin/kudu/TestingKuduServer.java
Show resolved
Hide resolved
plugin/trino-kudu/src/main/java/io/trino/plugin/kudu/KuduClientConfig.java
Show resolved
Hide resolved
plugin/trino-kudu/src/main/java/io/trino/plugin/kudu/KuduPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-kudu/src/main/java/io/trino/plugin/kudu/KuduUpdatablePageSource.java
Outdated
Show resolved
Hide resolved
...ino-kudu/src/main/java/io/trino/plugin/kudu/schema/SchemaEmulationByTableNameConvention.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % comments
c29de25
to
796587a
Compare
Upgrading the kudu client revealed a few problems: 1. Timeouts to kudu tablets were sometimes occurring during deletes due to a bug in the kudu java client in version 1.13.0. 2. Timeouts were *not* failing query execution because the kudu connector was configured to flush operations in the background. 3. The two combined above meant tests that did deletes sometimes actually did not perform deletes and would fail. This patch upgrades the kudu client, explicitly fails trino execution when kudu rpcs timeout, and marks unsupported data types from kudu 1.15.0.
Does not do anything in kudu 1.15.0
796587a
to
1e28ba5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
{ | ||
private static final String KUDU_VERSION = "1.13.0"; | ||
|
||
public static class TestKuduSmokeTestWithDisabledInferSchema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this is not defined as a top-level class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #10940 (comment). To keep all tests for same version together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing a constant doesn't require nesting classes.
i admire cleverness, but this also means unnecessary class hierarchy, which doesn't help browse the code.
if the paradigm was more frequent in the code base, i would probably get used to it and wouldn't complain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed depending on how you run the test:
mvn test -Dtest=io.trino.plugin.kudu.KuduLatestConnectorTests
mvn test
some tests will be skipped in the class hierarchy.
(EDIT: BaseConnectorTest
has a test to ensure the class name ends in ConnectorTest
, for some reason when this is not satisfied for a static inner test class no test failure happens and instead the test gets skipped)
Additionally, when trying to run tests through intellij's UI you can only run the static inner classes (not the top level class).
Seems like the tooling support for static inner test classes is just not good, I'm going to move these to top level classes
Upgrading the kudu client revealed a few problems:
Timeouts to kudu tablets were sometimes occurring during deletes due to this change introduced in the kudu java client in version 1.13.0: apache/kudu@d23ee5d#diff-f1f50409d81052b8f8d7aea7da663c185c704c6206cb0ec901114f4d9ee8c28f
(see here for the reason why that commit broke the client: https://gerrit.cloudera.org/#/c/18166/)
Timeouts were not failing query execution because the kudu connector was configured to flush operations in the background.
The two combined above meant tests that did deletes sometimes actually did not perform deletes and would fail.
The first commit upgrades the kudu client and additionally explicitly fails trino execution when kudu rpcs timeout. This resolves: #5687
The second commit fixes the problem of one source of kudu timeouts in the 1.15.0 client .