Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check access to masked columns in bulk when analyzing masks #23064

Conversation

ksobolew
Copy link
Contributor

@ksobolew ksobolew commented Aug 19, 2024

Description

The getColumnMasks SPI call was added in order to reduce the number of SPI calls in cases of tables with large number of columns, similar to how checkCanSelectFromColumns does it. This is an important performance optimization for access controls like OPA. But, once we discover the column masks, we check whether the current session has access to the masked columns - if not, we skip analyzing it further. The problem was that each column mask was inspected in isolation, resulting in multiple checks to checkCanSelectFromColumns with just a single column. I believe this is contrary to the goals of the change which introduced getColumnMasks.

This PR adds a fast path with a single check for all masked columns instead of one check per each masked column. If that single check fails, though, we enter the slow path, in which we analyze masks on all the accessible columns to see if they reference any other columns. This is relevant when the hide-inaccessible-columns option is set, in which case we cannot assume that an inaccessible masked column will fail the query.

Additional context and related issues

The SPI call was added and access control code changed to use it in #21997.

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Aug 19, 2024
@ksobolew ksobolew requested review from dain and mosiac1 August 19, 2024 07:43
for (ColumnSchema columnSchema : columnSchemas) {
Optional.ofNullable(masks.get(columnSchema)).ifPresent(mask -> {
if (checkCanSelectFromColumn(name, columnSchema.getName())) {
if (checkCanSelectFromColumns(name, columnSchemas.keySet())) {
Copy link
Contributor

@mosiac1 mosiac1 Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move the call to accessControl.getColumnMasks into this if block so the column masks are not fetched if they are not going to be processed, wdyt?

A similar logic may be applicable to the row filters; I imagine row filters aren't useful if the session can't select from columns. I am not very familiar with how trino analyses row filters so please double-check.

LE: looking at the test failures for the initial version those were likely caused because checkCanSelectFromColumns was called with all columns rather than just column that have masks. Considering this we can't avoid the getColumnMasks check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move the call to accessControl.getColumnMasks into this if block so the column masks are not fetched if they are not going to be processed, wdyt?

I had to rework the code a bit, because I was not aware of some special behavior with hide-inaccessible-columns (thankfully Trino has excellent test coverage), so now it's not beneficial anymore. But good point.

A similar logic may be applicable to the row filters; I imagine row filters aren't useful if the session can't select from column. I am not very familiar with how trino analyses row filters so please double-check.

Row filters are on a table level, so that's a little different, and the filtering expression is analyzed as part of the WHERE clause. We could imagine a bulk getRowFilters, though - just as there are tables with thousands of columns, there are some crazy queries that reference thousands of tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Row filters are on a table level, so that's a little different, and the filtering expression is analyzed as part of the WHERE clause.

On the other hand I may be out of my depth here as well :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand I may be out of my depth here as well :)

It seems I totally am - that method is called before the analyzer knows which columns are referenced by the query, so we get all columns from the table. That access check was introduced to avoid confusing failures when we analyze masks on columns which are not involved in the query at all.

The `getColumnMasks` SPI call was added in order to reduce the number of
SPI calls in cases of tables with large number of columns, similar to
how `checkCanSelectFromColumns` does it. This is an important
performance optimization for access controls like OPA. But, once we
discover the column masks, we check whether the current session has
access to the masked columns - if not, we skip analyzing it further. The
problem was that each column mask was inspected in isolation, resulting
in multiple checks to `checkCanSelectFromColumns` with just a single
column. I believe this is contrary to the goals of the change which
introduced `getColumnMasks`.

This PR adds a fast path with a single check for all masked columns
instead of one check per each masked column. If that single check fails,
though, we enter the slow path, in which we analyze masks on all the
accessible columns to see if they reference any other columns. This is
relevant when the `hide-inaccessible-columns` option is set, in which
case we cannot assume that an inaccessible masked column will fail the
query.
@ksobolew ksobolew force-pushed the kudi/batch-access-check-in-analyze-masks branch from 38e7032 to e6d6654 Compare August 19, 2024 09:56
@@ -2375,25 +2375,37 @@ private void checkStorageTableNotRedirected(QualifiedObjectName source)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have something in place to prevent such regressions in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have something in place to prevent such regressions in future.

We have some tests which are counting invocations of some interface, like the metastore calls. These are useful when these invocations are known to be slow and we want to have as little of them as possible. But doing something like that for access control would be a big task.


for (ColumnSchema columnSchema : columnSchemas) {
Optional.ofNullable(masks.get(columnSchema)).ifPresent(mask -> {
if (checkCanSelectFromColumn(name, columnSchema.getName())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why we were doing a check here in a first place. See io.trino.sql.analyzer.Analyzer#analyze(io.trino.sql.tree.Statement, io.trino.sql.analyzer.QueryType). We used to have similar problem in the past that same tables and columns were checked many times, so we collect then during analysis and check once at the end. I guess we need follow the approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either, but I guess it's because we don't want to do analysis on things the user doesn't have access to because it may in fact be creating a risk of information leakage. I can't think of a scenario off the top of my head, but I think caution is warranted here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, see who introduced that check: #6017 (and #6378) :) It was in response to #4496, which suggests this is not strictly a security thing, but a usability thing? - if the column mask's evaluation would cause an error, we can avoid that error by not evaluating it for inaccessible columns. If I understand this right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think if we would have #15131 then we would be able to solve #4496? Also I believe with that approach we would be able to access control checks in io.trino.sql.analyzer.Analyzer#analyze(io.trino.sql.tree.Statement, io.trino.sql.analyzer.QueryType) as we could simply do then sooner. But what we need to have is a test for number of access control checks to make sure we don't regress it.

@@ -2375,25 +2375,37 @@ private void checkStorageTableNotRedirected(QualifiedObjectName source)

private void analyzeFiltersAndMasks(Table table, QualifiedObjectName name, RelationType relationType, Scope accessControlScope)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice #15131, which could solve this problem too and then you don't have to think about it and have complex optimization cod in statement analyzer which is not a simple thing on its own.

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Sep 18, 2024
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants