Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check access to masked columns in bulk when analyzing masks #23064

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2375,25 +2375,37 @@ private void checkStorageTableNotRedirected(QualifiedObjectName source)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have something in place to prevent such regressions in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have something in place to prevent such regressions in future.

We have some tests which are counting invocations of some interface, like the metastore calls. These are useful when these invocations are known to be slow and we want to have as little of them as possible. But doing something like that for access control would be a big task.

private void analyzeFiltersAndMasks(Table table, QualifiedObjectName name, RelationType relationType, Scope accessControlScope)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice #15131, which could solve this problem too and then you don't have to think about it and have complex optimization cod in statement analyzer which is not a simple thing on its own.

{
ImmutableList.Builder<ColumnSchema> columnSchemaBuilder = ImmutableList.builder();
ImmutableMap.Builder<String, ColumnSchema> columnSchemaBuilder = ImmutableMap.builder();
for (int index = 0; index < relationType.getAllFieldCount(); index++) {
Field field = relationType.getFieldByIndex(index);
field.getName().ifPresent(fieldName -> columnSchemaBuilder.add(ColumnSchema.builder()
field.getName().ifPresent(fieldName -> columnSchemaBuilder.put(fieldName, ColumnSchema.builder()
.setName(fieldName)
.setType(field.getType())
.setHidden(field.isHidden())
.build()));
}
List<ColumnSchema> columnSchemas = columnSchemaBuilder.build();
Map<String, ColumnSchema> columnSchemas = columnSchemaBuilder.buildOrThrow();

Map<ColumnSchema, ViewExpression> masks = accessControl.getColumnMasks(session.toSecurityContext(), name, columnSchemas);
Map<ColumnSchema, ViewExpression> masks = accessControl.getColumnMasks(session.toSecurityContext(), name, ImmutableList.copyOf(columnSchemas.values()));

for (ColumnSchema columnSchema : columnSchemas) {
Optional.ofNullable(masks.get(columnSchema)).ifPresent(mask -> {
if (checkCanSelectFromColumn(name, columnSchema.getName())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why we were doing a check here in a first place. See io.trino.sql.analyzer.Analyzer#analyze(io.trino.sql.tree.Statement, io.trino.sql.analyzer.QueryType). We used to have similar problem in the past that same tables and columns were checked many times, so we collect then during analysis and check once at the end. I guess we need follow the approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either, but I guess it's because we don't want to do analysis on things the user doesn't have access to because it may in fact be creating a risk of information leakage. I can't think of a scenario off the top of my head, but I think caution is warranted here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, see who introduced that check: #6017 (and #6378) :) It was in response to #4496, which suggests this is not strictly a security thing, but a usability thing? - if the column mask's evaluation would cause an error, we can avoid that error by not evaluating it for inaccessible columns. If I understand this right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think if we would have #15131 then we would be able to solve #4496? Also I believe with that approach we would be able to access control checks in io.trino.sql.analyzer.Analyzer#analyze(io.trino.sql.tree.Statement, io.trino.sql.analyzer.QueryType) as we could simply do then sooner. But what we need to have is a test for number of access control checks to make sure we don't regress it.

// check for fast path: a single bulk check of all columns
if (checkCanSelectFromColumns(name, columnSchemas.keySet())) {
Copy link
Contributor

@mosiac1 mosiac1 Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move the call to accessControl.getColumnMasks into this if block so the column masks are not fetched if they are not going to be processed, wdyt?

A similar logic may be applicable to the row filters; I imagine row filters aren't useful if the session can't select from columns. I am not very familiar with how trino analyses row filters so please double-check.

LE: looking at the test failures for the initial version those were likely caused because checkCanSelectFromColumns was called with all columns rather than just column that have masks. Considering this we can't avoid the getColumnMasks check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move the call to accessControl.getColumnMasks into this if block so the column masks are not fetched if they are not going to be processed, wdyt?

I had to rework the code a bit, because I was not aware of some special behavior with hide-inaccessible-columns (thankfully Trino has excellent test coverage), so now it's not beneficial anymore. But good point.

A similar logic may be applicable to the row filters; I imagine row filters aren't useful if the session can't select from column. I am not very familiar with how trino analyses row filters so please double-check.

Row filters are on a table level, so that's a little different, and the filtering expression is analyzed as part of the WHERE clause. We could imagine a bulk getRowFilters, though - just as there are tables with thousands of columns, there are some crazy queries that reference thousands of tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Row filters are on a table level, so that's a little different, and the filtering expression is analyzed as part of the WHERE clause.

On the other hand I may be out of my depth here as well :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand I may be out of my depth here as well :)

It seems I totally am - that method is called before the analyzer knows which columns are referenced by the query, so we get all columns from the table. That access check was introduced to avoid confusing failures when we analyze masks on columns which are not involved in the query at all.

for (ColumnSchema columnSchema : columnSchemas.values()) {
Optional.ofNullable(masks.get(columnSchema)).ifPresent(mask -> {
analyzeColumnMask(session.getIdentity().getUser(), table, name, columnSchema, accessControlScope, mask);
}
});
});
}
}
else {
// slow path: if any column is inaccessible, we need to analyze all the accessible ones
// to see if they reference any other column(s)
for (ColumnSchema columnSchema : columnSchemas.values()) {
Optional.ofNullable(masks.get(columnSchema)).ifPresent(mask -> {
if (checkCanSelectFromColumns(name, ImmutableSet.of(columnSchema.getName()))) {
analyzeColumnMask(session.getIdentity().getUser(), table, name, columnSchema, accessControlScope, mask);
}
});
}
}

accessControl.getRowFilters(session.toSecurityContext(), name)
Expand All @@ -2412,10 +2424,10 @@ private void analyzeCheckConstraints(Table table, QualifiedObjectName name, Scop
}
}

private boolean checkCanSelectFromColumn(QualifiedObjectName name, String column)
private boolean checkCanSelectFromColumns(QualifiedObjectName name, Set<String> columns)
{
try {
accessControl.checkCanSelectFromColumns(session.toSecurityContext(), name, ImmutableSet.of(column));
accessControl.checkCanSelectFromColumns(session.toSecurityContext(), name, columns);
return true;
}
catch (AccessDeniedException e) {
Expand Down
Loading