Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TableHandles with ColumnHandles caching #14751

Closed

Conversation

ssheikin
Copy link
Contributor

Description

Non-technical explanation

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 25, 2022
@ssheikin ssheikin marked this pull request as draft October 25, 2022 14:38
@kokosing kokosing changed the title [WIP] [DO NOT review] Fix TableHandles with ColumnHandles caching Fix TableHandles with ColumnHandles caching Oct 25, 2022
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to me looks reasonable.

@@ -264,7 +264,12 @@ public Optional<ProjectionApplicationResult<ConnectorTableHandle>> applyProjecti
return Optional.empty();
}

verify(tableColumnSet.containsAll(newColumnSet), "applyProjection called with columns %s and some are not available in existing query: %s", newColumnSet, tableColumnSet);
Set<JdbcColumnHandle> tableSyntheticColumnSet = ImmutableSet.<JdbcColumnHandle>builder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to provide a test?

Copy link
Contributor Author

@ssheikin ssheikin Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raunaqmorarka @sopel39 could you please advice which connector invokes PruneTableScanColumns#pruneColumns e.g. running BaseConnectorTest#testDeleteWithSubquery

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That rule should trigger whenever there are more columns output from the table scan than are used by the rest of the query plan. E.g. select nationkey, count(*) from (select * from nation) group by 1 in tpch.tiny

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssheikin it looks like a bug fix. do you have a query reproducing the problem?

@ssheikin ssheikin force-pushed the ssheikin/61/oss/try-reuse-columns branch from 9063668 to ce1ad4b Compare October 26, 2022 09:13
Table handle contains column handles. When column handle is changed,
data in cache for table handles is outdated.
Test TableHandles cache invalidation on columns change.
DeleteRowIdColumnHandle does not belong to the table handle however and
is not designed to belong to it. This column handle required during
analysis phase. The column is used for row-level delete.
@ssheikin ssheikin force-pushed the ssheikin/61/oss/try-reuse-columns branch from ce1ad4b to a54d488 Compare October 26, 2022 09:14
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Invalidate caches more coarsely"

@@ -412,28 +412,28 @@ public void setTableComment(ConnectorSession session, JdbcTableHandle handle, Op
public void setColumnComment(ConnectorSession session, JdbcTableHandle handle, JdbcColumnHandle column, Optional<String> comment)
{
delegate.setColumnComment(session, handle, column, comment);
invalidateColumnsCache(handle.asPlainTable().getSchemaTableName());
invalidateTableCaches(handle.asPlainTable().getSchemaTableName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmt msg

Invalidate caches more coarsely

Table handle contains column handles. When column handle is changed,
data in cache for table handles is outdated.

I'd suggest title like "Invalidate table handle cache when column changed"

then drop "handle" in "When column handle is changed," -- it's not the column handle what's changing (column handles are immutable), it's the column itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why having stale io.trino.plugin.jdbc.JdbcColumnHandle#comment value inside a JdbcTableHandle.columns matters?

I don't think it's used explicitly. If it's used implicitly (via equals), then maybe we have concurrency problem.
i.e. what happens if one query pulls a JdbcTableHandle and then (before the first query planning finishes), some other query performs setColumnComment.
The CachingJdbcClient state will be eventually consistent, but the first query will be planning on an inconsistent state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only reasonable usage of the value of the comment is in JdbcColumnHandle.getRetainedSizeInBytes()

How the situation described above differs from the e.g. addColumn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only reasonable usage of the value of the comment is in JdbcColumnHandle.getRetainedSizeInBytes()

so the invalidation doesn't matter.

How the situation described above differs from the e.g. addColumn?

Good question.
Generally the JDBC table handle won't carry columns until they are projected.

tableHandles.add(new JdbcTableHandle(schemaTableName, getRemoteTable(resultSet), getTableComment(resultSet)));

But I can imagine some JDBC connector eagerly populating the JdbcTableHandle.columns field within io.trino.plugin.jdbc.JdbcClient#getTableHandle(io.trino.spi.connector.ConnectorSession, io.trino.spi.connector.SchemaTableName) call. Then, addColumn changes the state in a way that would matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway, I am fine with doing invalidateTableCaches here, for consistency.
i just hope this is "for consistency" and not "a fix"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JdbcTableHandle.columns matters?

It matters. Plenty of connectors for getColumns do a short-circuit and return columns from table handle. If they do not matter we should remove them.

Comment on lines +116 to +117
// do not throw when invoked, however do not allow to set non-empty comment until the connector supports setting column comments
verify(comment.isEmpty(), "This connector does not support setting column comments");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct anyway.
setColumnComment invoked with empty comment should unset existing comment, so the verify's safety is an illusion.

what about:

// Ignore (not fail) for testing purposes.

without a verify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct anyway.

true.

unset existing comment

existing comment does not exist, because the connector does not support setting column comments.
And for the similar case with table comment getTableComment always returns Optional.empty(), so it's in tact.

without a verify

I'd left it as it is, because if I test setColumnComment, I'd not test it as setColumnComment(Optional.empty()), but with some value, and this could led to some higher expectations from tests which use this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unset existing comment

existing comment does not exist, because the connector does not support setting column comments.

it could exist if H2 supports column comments.

Trino is not the only gate to the underlying db

without a verify

I'd left it as it is, because if I test setColumnComment, I'd not test it as setColumnComment(Optional.empty()), but with some value, and this could led to some higher expectations from tests which use this function.

Fine. Leave it but please make it clear it's not correct, but just as a reminder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it. I was thinking that setting empty comment is fine, for connectors that do not support it. However I was wrong:

  • It is a difference between empty comment and null comment
  • H2 may change and we could give a wrong impression that support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's controversial. So better trade-off is not to test caching for addComment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As agreed, left it as it is now.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fix verify in DefaultJdbcMetadata#applyProjection"

verify(tableColumnSet.containsAll(newColumnSet), "applyProjection called with columns %s and some are not available in existing query: %s", newColumnSet, tableColumnSet);
Set<JdbcColumnHandle> tableSyntheticColumnSet = ImmutableSet.<JdbcColumnHandle>builder()
.addAll(tableColumnSet)
.add((JdbcColumnHandle) getDeleteRowIdColumnHandle(session, table))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make getDeleteRowIdColumnHandle return JdbcColumnHandle and remove the cast here.

@@ -264,7 +264,12 @@ public Optional<ProjectionApplicationResult<ConnectorTableHandle>> applyProjecti
return Optional.empty();
}

verify(tableColumnSet.containsAll(newColumnSet), "applyProjection called with columns %s and some are not available in existing query: %s", newColumnSet, tableColumnSet);
Set<JdbcColumnHandle> tableSyntheticColumnSet = ImmutableSet.<JdbcColumnHandle>builder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssheikin it looks like a bug fix. do you have a query reproducing the problem?

@findepi
Copy link
Member

findepi commented Oct 26, 2022

@ssheikin is it still a Draft, or for review?

@ssheikin
Copy link
Contributor Author

I've addressed comments for the first commit and extracted it as a separate PR: #14762

@ssheikin ssheikin closed this Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants