-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TableHandles with ColumnHandles caching #14751
Fix TableHandles with ColumnHandles caching #14751
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to me looks reasonable.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/CachingJdbcClient.java
Show resolved
Hide resolved
@@ -264,7 +264,12 @@ public Optional<ProjectionApplicationResult<ConnectorTableHandle>> applyProjecti | |||
return Optional.empty(); | |||
} | |||
|
|||
verify(tableColumnSet.containsAll(newColumnSet), "applyProjection called with columns %s and some are not available in existing query: %s", newColumnSet, tableColumnSet); | |||
Set<JdbcColumnHandle> tableSyntheticColumnSet = ImmutableSet.<JdbcColumnHandle>builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to provide a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raunaqmorarka @sopel39 could you please advice which connector invokes PruneTableScanColumns#pruneColumns
e.g. running BaseConnectorTest#testDeleteWithSubquery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That rule should trigger whenever there are more columns output from the table scan than are used by the rest of the query plan. E.g. select nationkey, count(*) from (select * from nation) group by 1
in tpch.tiny
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssheikin it looks like a bug fix. do you have a query reproducing the problem?
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/CachingJdbcClient.java
Show resolved
Hide resolved
9063668
to
ce1ad4b
Compare
Table handle contains column handles. When column handle is changed, data in cache for table handles is outdated. Test TableHandles cache invalidation on columns change.
DeleteRowIdColumnHandle does not belong to the table handle however and is not designed to belong to it. This column handle required during analysis phase. The column is used for row-level delete.
ce1ad4b
to
a54d488
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Invalidate caches more coarsely"
@@ -412,28 +412,28 @@ public void setTableComment(ConnectorSession session, JdbcTableHandle handle, Op | |||
public void setColumnComment(ConnectorSession session, JdbcTableHandle handle, JdbcColumnHandle column, Optional<String> comment) | |||
{ | |||
delegate.setColumnComment(session, handle, column, comment); | |||
invalidateColumnsCache(handle.asPlainTable().getSchemaTableName()); | |||
invalidateTableCaches(handle.asPlainTable().getSchemaTableName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmt msg
Invalidate caches more coarsely
Table handle contains column handles. When column handle is changed,
data in cache for table handles is outdated.
I'd suggest title like "Invalidate table handle cache when column changed"
then drop "handle" in "When column handle is changed," -- it's not the column handle what's changing (column handles are immutable), it's the column itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW why having stale io.trino.plugin.jdbc.JdbcColumnHandle#comment
value inside a JdbcTableHandle.columns
matters?
I don't think it's used explicitly. If it's used implicitly (via equals), then maybe we have concurrency problem.
i.e. what happens if one query pulls a JdbcTableHandle
and then (before the first query planning finishes), some other query performs setColumnComment
.
The CachingJdbcClient
state will be eventually consistent, but the first query will be planning on an inconsistent state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only reasonable usage of the value of the comment is in JdbcColumnHandle.getRetainedSizeInBytes()
How the situation described above differs from the e.g. addColumn
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only reasonable usage of the value of the comment is in
JdbcColumnHandle.getRetainedSizeInBytes()
so the invalidation doesn't matter.
How the situation described above differs from the e.g.
addColumn
?
Good question.
Generally the JDBC table handle won't carry columns until they are projected.
trino/plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/BaseJdbcClient.java
Line 256 in 5579e70
tableHandles.add(new JdbcTableHandle(schemaTableName, getRemoteTable(resultSet), getTableComment(resultSet))); |
But I can imagine some JDBC connector eagerly populating the JdbcTableHandle.columns
field within io.trino.plugin.jdbc.JdbcClient#getTableHandle(io.trino.spi.connector.ConnectorSession, io.trino.spi.connector.SchemaTableName)
call. Then, addColumn
changes the state in a way that would matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway, I am fine with doing invalidateTableCaches
here, for consistency.
i just hope this is "for consistency" and not "a fix"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JdbcTableHandle.columns matters?
It matters. Plenty of connectors for getColumns
do a short-circuit and return columns from table handle. If they do not matter we should remove them.
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/CachingJdbcClient.java
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestCachingJdbcClient.java
Show resolved
Hide resolved
// do not throw when invoked, however do not allow to set non-empty comment until the connector supports setting column comments | ||
verify(comment.isEmpty(), "This connector does not support setting column comments"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct anyway.
setColumnComment
invoked with empty comment should unset existing comment, so the verify
's safety is an illusion.
what about:
// Ignore (not fail) for testing purposes.
without a verify
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct anyway.
true.
unset existing comment
existing comment does not exist, because the connector does not support setting column comments
.
And for the similar case with table comment getTableComment
always returns Optional.empty()
, so it's in tact.
without a verify
I'd left it as it is, because if I test setColumnComment
, I'd not test it as setColumnComment(Optional.empty())
, but with some value, and this could led to some higher expectations from tests which use this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unset existing comment
existing comment does not exist, because
the connector does not support setting column comments
.
it could exist if H2 supports column comments.
Trino is not the only gate to the underlying db
without a verify
I'd left it as it is, because if I test
setColumnComment
, I'd not test it assetColumnComment(Optional.empty())
, but with some value, and this could led to some higher expectations from tests which use this function.
Fine. Leave it but please make it clear it's not correct, but just as a reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it. I was thinking that setting empty comment is fine, for connectors that do not support it. However I was wrong:
- It is a difference between empty comment and null comment
- H2 may change and we could give a wrong impression that support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's controversial. So better trade-off is not to test caching for addComment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As agreed, left it as it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Fix verify in DefaultJdbcMetadata#applyProjection"
verify(tableColumnSet.containsAll(newColumnSet), "applyProjection called with columns %s and some are not available in existing query: %s", newColumnSet, tableColumnSet); | ||
Set<JdbcColumnHandle> tableSyntheticColumnSet = ImmutableSet.<JdbcColumnHandle>builder() | ||
.addAll(tableColumnSet) | ||
.add((JdbcColumnHandle) getDeleteRowIdColumnHandle(session, table)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make getDeleteRowIdColumnHandle return JdbcColumnHandle and remove the cast here.
@@ -264,7 +264,12 @@ public Optional<ProjectionApplicationResult<ConnectorTableHandle>> applyProjecti | |||
return Optional.empty(); | |||
} | |||
|
|||
verify(tableColumnSet.containsAll(newColumnSet), "applyProjection called with columns %s and some are not available in existing query: %s", newColumnSet, tableColumnSet); | |||
Set<JdbcColumnHandle> tableSyntheticColumnSet = ImmutableSet.<JdbcColumnHandle>builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssheikin it looks like a bug fix. do you have a query reproducing the problem?
@ssheikin is it still a Draft, or for review? |
I've addressed comments for the first commit and extracted it as a separate PR: #14762 |
Description
Non-technical explanation
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: