-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add anonymized query plan in json format to QueryCompletedEvent #12968
Add anonymized query plan in json format to QueryCompletedEvent #12968
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add examples of anonymised plans to the PR
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/SystemPartitioningHandle.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/Anonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/TableInfoSupplier.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/SqlQueryScheduler.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/Anonymizer.java
Outdated
Show resolved
Hide resolved
.../trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/Anonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
ping |
Test failures are related, PTAL |
core/trino-main/src/main/java/io/trino/sql/analyzer/QueryExplainer.java
Outdated
Show resolved
Hide resolved
Query:
|
core/trino-main/src/test/java/io/trino/sql/planner/planprinter/TestJsonRepresentation.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
.../trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
.../trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
.../trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
.../trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
...ino-main/src/test/java/io/trino/sql/planner/planprinter/TestAnonymizeJsonRepresentation.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/NoOpAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Show resolved
Hide resolved
Optional<String> connectorName = metadata.listCatalogs(session).stream() | ||
.filter(catalogInfo -> catalogInfo.getCatalogName().equals(tableSchema.getCatalogName())) | ||
.map(CatalogInfo::getConnectorName) | ||
.findFirst(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should always exists, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally yes, but I don't know if we should assume that. In future if catalogs can be removed on the fly, I'm not sure if that assumption would continue to hold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In future if catalogs can be removed on the fly, I'm not sure if that assumption would continue to hold.
I don't think they can be removed for active query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should always exists, right?
I tried making it not optional, but there were many tests which started failing. IIRC these errors were coming in the case of an information schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Could you tell why for information schema connector name is missing?
import static io.trino.sql.planner.plan.StatisticsWriterNode.WriteStatisticsTarget; | ||
import static io.trino.sql.planner.plan.TableWriterNode.WriterTarget; | ||
|
||
public interface Anonymizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an interface? There should only be one implementation of the anonymizer, so an interface is overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One implementation performs anonymisation, the other doesn't. Could you take a look at its usage in PlanPrinter and suggest a different way if this can be improved ?
@@ -56,6 +57,7 @@ public QueryMetadata( | |||
List<RoutineInfo> routines, | |||
URI uri, | |||
Optional<String> plan, | |||
Optional<String> anonymizedJsonPlan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case for including both the regular plan and the anonymized plan? Most event listeners will just forward and store the event as is, which defeats the purpose of having an anonymized plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to have different use cases for regular plan and anonymised plan. E.g. Regular plan can be used to display query plan for past queries to a user in a UI. This would be user specific data that might not be made easily accessible to everyone, it's also formatted in a way that is not ideal for offline analysis.
Anonymized plan could be sent to a different downstream system which is more widely accessible and easier to work with for offline analysis.
Currently we send same event to all event listeners, so we also can't use different event listeners to get different versions of plan.
@@ -48,6 +48,7 @@ | |||
import static io.trino.sql.analyzer.QueryType.EXPLAIN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should it be separate PR?
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/anonymize/Anonymizer.java
Outdated
Show resolved
Hide resolved
@raunaqmorarka @sopel39 PTAL |
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
Optional<String> connectorName = metadata.listCatalogs(session).stream() | ||
.filter(catalogInfo -> catalogInfo.getCatalogName().equals(tableSchema.getCatalogName())) | ||
.map(CatalogInfo::getConnectorName) | ||
.findFirst(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Could you tell why for information schema connector name is missing?
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/Anonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashCodeAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
For some reason, We don't register the system or information schema's catalog name to PS: Github is not letting me add a comment in the open thread 😞 cc @sopel39 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % comments
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/JsonRenderer.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
...ino-main/src/test/java/io/trino/sql/planner/planprinter/TestAnonymizeJsonRepresentation.java
Outdated
Show resolved
Hide resolved
if (replicateNullsAndAny) { | ||
builder.append(format("Output partitioning: %s (replicate nulls and any) [%s]%s\n", | ||
partitioningScheme.getPartitioning().getHandle(), | ||
Joiner.on(", ").join(arguments), | ||
formatHash(partitioningScheme.getHashColumn()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not keep using formatHash
method here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've made formatHash
to be a non-static method
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
public static String formatAggregation(Aggregation aggregation) | ||
public static String formatAggregation(Anonymizer anonymizer, Aggregation aggregation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider making it non static so that you don't have to pass anonymizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be redundant to make formatAggregation
static because it is being used in GraphVizPlanPrinter
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways in PlanPrinter
, formatAggregation
is only used at 2-3 places.
Is failure related? |
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/planner/planprinter/TestExpressionAnonymization.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/planner/planprinter/TestExpressionAnonymization.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/planner/planprinter/TestExpressionAnonymization.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/sql/planner/planprinter/TestJsonRepresentation.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/HashingAnonymizer.java
Outdated
Show resolved
Hide resolved
@martint @raunaqmorarka @sopel39 Implemented counter-based anonymization |
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/JsonRenderer.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/eventlistener/EventListenerManager.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/testing/TestingEventListenerManager.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/eventlistener/EventListener.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/CounterBasedAnonymizer.java
Outdated
Show resolved
Hide resolved
}; | ||
} | ||
|
||
private static class TestLiteral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this?
The AST is meant to be a closed hierarchy and not to be extended by third-parties. There's a lot of infrastructure that depends on knowing exactly what classes are part of the AST. Eventually, we'll update it to use Java 17's sealed types and this won't be possible to do at all and enforced at compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is to test the scenario that a new sub-class of Literal is added but CounterBasedAnonymizer#anonymizeExpression
isn't updated to handle that new class. In this scenario, we should still anonymise the literal.
It's not strictly necessary though, let me know if you want this removed or handled differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let’s remove it. It’s not a correct usage of the AST classes and will break in the future.
As long as the anonymizer fails if a new class is added (very unlikely), we’ll be able to catch that very quickly and add the relevant code, so we don’t even have to handle anonymization for the general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this now and modified CounterBasedAnonymizer#anonymizeExpression
to throw UnsupportedOperationException in case of un-handled Literal sub-class.
return anonymizeExpression(expression).toString(); | ||
} | ||
|
||
private Expression anonymizeExpression(Expression expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have concerns about this, which I mentioned before in a related context. Anonymized expressions are not valid SQL, so we should not be trying to construct an AST out of them (effectively what this method does)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API of Anonymizer is String anonymize(Expression expression)
, so we don't really want an Anonymized expression, an anonymised string representation of the Expression will do.
This method is creating an Expression because it was convenient to use an ExpressionRewriter to anonymise literals and then use Expression#toString on the result.
I think an alternative could be that we write something similar to ExpressionFormatter#Formatter
that delegates to existing formatter for all methods except the ones we want to anonymise (visitXXXLiteral).
Would that be better or is there another way that you would recommend instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be better. Alternatively, we should consider and explore making the expression formatter itself anonymizer-aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the expression formatter such that we could use that directly instead of creating anonymized AST. PTAL @martint
This will be used in PlanPrinter to print connector name as part of table scan node in case of anonymization.
The general approach looks good now. I'll leave it up to @sopel39 to do the final review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % comments
core/trino-parser/src/main/java/io/trino/sql/ExpressionFormatter.java
Outdated
Show resolved
Hide resolved
ImmutableList.of("symbol_1 := column_2"), | ||
ImmutableList.of(), | ||
ImmutableList.of())))); | ||
assertThat(event.getMetadata().getJsonPlan()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should anonymize both jsonPlan
and plan
. Could you create an issue + add TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are anonymizing both plan and jsonPlan. It's just there's no test for plan
in TestEventListenerBasic
. In general, there's no good testing of the text plan.
core/trino-spi/src/main/java/io/trino/spi/eventlistener/EventListener.java
Show resolved
Hide resolved
This can be used to collect anonymised plans through query event listeners and to print anonymized plans from EXPLAIN.
@sopel39 I've addressed comments |
@@ -26,4 +26,12 @@ default void queryCompleted(QueryCompletedEvent queryCompletedEvent) | |||
default void splitCompleted(SplitCompletedEvent splitCompletedEvent) | |||
{ | |||
} | |||
|
|||
/** | |||
* Specify whether the plan included in QueryCompletedEvent should be anonymized or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this should mention that both plan
and jsonPlan
are anonymized
Description
Add anonymized query plan in json format to QueryCompletedEvent
Implement EXPLAIN (TYPE DISTRIBUTED, FORMAT JSON)
New feature
Event listener SPI, EXPLAIN
Provides anonymised query plan in json format to event listener to enable offline analysis without leaking sensitive info.
Implements EXPLAIN (TYPE DISTRIBUTED, FORMAT JSON)
Documentation
( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: