Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add context appender to sql based connectors #14500

Merged
merged 2 commits into from
Nov 24, 2022

Conversation

s2lomon
Copy link
Member

@s2lomon s2lomon commented Oct 6, 2022

Description

Add context appender to sql based connectors

This is a first iteration for SELECT Queries of a mechanism
that will allow to send to the actual database a bit more
information, that might be available only in Trino. It's for the
purpose of these other systems to be able to capture this info and
log it for its own purposes. For now we are sending it as a comments
added at the end of a query.

Non-technical explanation

This pr adds an ability to add additional context of query execution and send it to external database,
like pgsql, oracle etc.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Section
* Add context comments to SELECT queries in sql based connectors.

@cla-bot cla-bot bot added the cla-signed label Oct 6, 2022
Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments


import io.trino.spi.connector.ConnectorSession;

public interface QueryExecutionConnectorSessionContextAppender
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name is confusing. I don't really understand the "connector" part.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from ConnectorSession But maybe it would be better to change it to QueryExecutionContextAppender

@findepi
Copy link
Member

findepi commented Oct 7, 2022

Its work in progress - shared just to gain feedback and start discussion on the approach.

Awesome!

What does this PR aim to solve?

@findepi
Copy link
Member

findepi commented Oct 7, 2022

Description

Non-technical explanation

Release notes

( ) This is not user-visible or docs only and no release notes are required. ( ) Release notes are required, please propose a release note for me. ( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

if you don't feel like filling this in, just remove it.
it will make my life easier as i don't need to read text that carries no information

thank you!

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just skimming.

not possible to do better review without knowing what is the purpose of the code

private static final String LOG_CANCELLATION_EVENT = "ERROR: canceling statement due to user request";

private static final Pattern SQL_QUERY_FIND_PATTERN = Pattern.compile("^(: |/C_\\d: )(.*)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth adorning this with example log line we're parsing

public class TestQueryContextLoggingEnabled
extends AbstractTestQueryFramework
{
private final TestingPostgreSqlServer postgreSqlServer = closeAfterClass(new TestingPostgreSqlServer());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never allocate resources or do IO in a test static or instance initializer.

  • in case of failure, surefire we'll be glad to misguide you on what happened
  • you use precious memory which is needed only for the test's duration

@hashhar
Copy link
Member

hashhar commented Oct 10, 2022

Some context would be useful when reviewing this (to understand expected usage).

@s2lomon s2lomon force-pushed the comment_query_appenders branch from c362b5d to d5aa7c1 Compare October 13, 2022 15:58
Copy link
Member Author

@s2lomon s2lomon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your responses, please take a look again.


import io.trino.spi.connector.ConnectorSession;

public interface QueryExecutionConnectorSessionContextAppender
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from ConnectorSession But maybe it would be better to change it to QueryExecutionContextAppender

@hashhar
Copy link
Member

hashhar commented Oct 13, 2022

I really hope people won't abuse this to implement passing query hints to Oracle.

Will take a look at the PR.

@kokosing kokosing changed the title WIP comment_query_appenders Add context appender to sql based connectors Oct 13, 2022
DistributedQueryRunner distributedQueryRunner = createPostgreSqlQueryRunner(
postgreSqlServer,
Map.of(),
Map.of("comment-logging.format", "query executed by $USER"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add all variables here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be hard to use them, as some of them are generated automatically (query_id) and some of them are not provided (as they would have to come from the client itself, like trace_token and source). It's tested extensively elsewhere.

QueryExecutionContextAppender NO_CONTEXT_APPENDER = (query, session) -> query;

String appendContext(String query, ConnectorSession session)
throws IllegalArgumentException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove throws

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? That's how you notify the clients that this method can actually throws something and it's by design. What's more you are showing to the implementors what kind of exceptions can be thrown, without the need of adding additional handling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IllegalArgumentException can be expected from any method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

What do you mean by "Ok"? I was already preparing popcorn and awaiting epic philosophical battle. Try be more assertive next time, other reviewers demand entertainment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha. Ha. Ha. ;)

@Praveen2112
Copy link
Member

What if we implement a ForwardingQueryBuilder which appends this context ? But we need to wrap all QueryBuilder impl with this new impl.

Copy link
Member Author

@s2lomon s2lomon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look now. I've made a 3 commits but I will squash them into a single one later on.

QueryExecutionContextAppender NO_CONTEXT_APPENDER = (query, session) -> query;

String appendContext(String query, ConnectorSession session)
throws IllegalArgumentException;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

DistributedQueryRunner distributedQueryRunner = createPostgreSqlQueryRunner(
postgreSqlServer,
Map.of(),
Map.of("comment-logging.format", "query executed by $USER"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be hard to use them, as some of them are generated automatically (query_id) and some of them are not provided (as they would have to come from the client itself, like trace_token and source). It's tested extensively elsewhere.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. I reviewed everything at once (not commit by commit).

@@ -658,7 +665,7 @@ public JdbcOutputTableHandle beginInsertTable(ConnectorSession session, JdbcTabl
}
}

protected void copyTableSchema(Connection connection, String catalogName, String schemaName, String tableName, String newTableName, List<String> columnNames)
protected void copyTableSchema(Connection connection, String catalogName, String schemaName, String tableName, String newTableName, List<String> columnNames, ConnectorSession session)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ConnectorSession is usually passed as first argument. Please apply that to all places.

log.debug("Execute: %s", appendedQuery);
statement.execute(appendedQuery);
}
catch (IllegalArgumentException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this catch? Is this an user, admin or developer error? The same comment to other places like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's potential SQLInject attack, that's why I want to handle it explicitly with JDBC_NON_TRANSIENT_ERROR - meaning that this SQL should not be repeated as the error won't go away afer a while and there is no point of repeating the query. You need to change the client setup to make it disappear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I suggest that the query modifier should throw proper exception directly or return Optional should be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Optional.empty() would mean that you can execute code without context being logged, even though configuration says that it will be logged. That's a security risk as well. I think that we should be fine throwing this exception directly though.

Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % kokosing remarks

QueryExecutionContextAppender NO_CONTEXT_APPENDER = (query, session) -> query;

String appendContext(String query, ConnectorSession session)
throws IllegalArgumentException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

What do you mean by "Ok"? I was already preparing popcorn and awaiting epic philosophical battle. Try be more assertive next time, other reviewers demand entertainment.

@skrzypo987
Copy link
Member

Change the PR description to cover the current scope

Copy link
Member Author

@s2lomon s2lomon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review, changes are not yet done, but please take a look at the comments.

log.debug("Execute: %s", appendedQuery);
statement.execute(appendedQuery);
}
catch (IllegalArgumentException e) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's potential SQLInject attack, that's why I want to handle it explicitly with JDBC_NON_TRANSIENT_ERROR - meaning that this SQL should not be repeated as the error won't go away afer a while and there is no point of repeating the query. You need to change the client setup to make it disappear.

for (PredefinedValue predefinedValue : PredefinedValue.values()) {
message = message.replaceAll(predefinedValue.getMatchCase(), predefinedValue.value(session));
}
return query + "/*" + message + "*/";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok it will look better in logs and I guess humans will be reading it with the query.

@@ -88,6 +93,18 @@ private static void execute(String url, Properties properties, String sql)
}
}

DatabaseEvents startCollecting()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@kokosing
Copy link
Member

I am fine with responses, let me know when you need another round of the review.

@s2lomon s2lomon force-pushed the comment_query_appenders branch from 09bfc83 to 5db6244 Compare November 3, 2022 09:31
Copy link
Member Author

@s2lomon s2lomon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments addressed. Please check again.

log.debug("Execute: %s", appendedQuery);
statement.execute(appendedQuery);
}
catch (IllegalArgumentException e) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Optional.empty() would mean that you can execute code without context being logged, even though configuration says that it will be logged. That's a security risk as well. I think that we should be fine throwing this exception directly though.

@s2lomon s2lomon force-pushed the comment_query_appenders branch 2 times, most recently from a824a6b to 8f90f2e Compare November 3, 2022 12:23
@findepi findepi removed their request for review November 4, 2022 12:54
@Praveen2112
Copy link
Member

It still shows some conflict, can you resolve them.

@s2lomon s2lomon force-pushed the comment_query_appenders branch from 7fe0d2b to f9b73fb Compare November 9, 2022 12:04
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.

statement = connection.prepareStatement(insertSql);
}
catch (TrinoException e) {
throw closeAllSuppress(e, connection);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would throw new TrinoException with e as cause. So all you need to do is catch (SQLException | TrinoException e)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it's correct from the exception handling stnadnpoint. We already have the exception and there is no need to wrap it again without adding any meaningful information to the caller. Plus we are not returning JDBC_ERROR but rather NON_TRANSIENT_JDBC_ERROR wich is the whole point of this exception being thrown.


import static java.util.stream.Collectors.joining;

public class RemoteQueryModifierConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe inline this class into QueryConfig?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this idea. It has quite a lot of tests and there is no point in adding these tests to default jdbc config tests as they are very specific

private static final Pattern VALIDATION_PATTERN = Pattern.compile("[\\w ,=]|" + String.join("|", PREDEFINED_MATCHES));
private String format = "";

@Config("remote-query.comment.format")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query.comment-format?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that remote-query is better, as it tells exactly which query we are talking about in this case. comment-format might be a bit closer to what we do in other places. so remote-query.comment-format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think we need to change query.reuse-connection. I am fine with this. However then the scope of this PR gets bigger.

Also I know we are all aware of concept of "remote query". I am not sure if it is well known to our users. There is a risk where "remote" part may confuse them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will follow your lead here.


@Config("remote-query.comment.format")
@ConfigDescription("Format in which logs about query execution context should be added as comments sent through jdbc driver." +
" It can consist only of letters, digits, spaces, underscores, equality signs and predefined values: $QUERY_ID, $SOURCE, $USER, $TRACE_TOKEN")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is too long. Let's add details to docs and here let's be short.

Message format which will be added as comment to remote query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.


private static RemoteQueryModifier createRemoteQueryModifier(String commentFormat)
{
return new RemoteQueryModifier(new RemoteQueryModifierConfig().setFormat(commentFormat));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a constructor that takes only format string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could and I could make it package-private to not be used outside of this package.

Copy link
Member Author

@s2lomon s2lomon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please take a look at my responses. I will append change shortly.

statement = connection.prepareStatement(insertSql);
}
catch (TrinoException e) {
throw closeAllSuppress(e, connection);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it's correct from the exception handling stnadnpoint. We already have the exception and there is no need to wrap it again without adding any meaningful information to the caller. Plus we are not returning JDBC_ERROR but rather NON_TRANSIENT_JDBC_ERROR wich is the whole point of this exception being thrown.

private static final Pattern VALIDATION_PATTERN = Pattern.compile("[\\w ,=]|" + String.join("|", PREDEFINED_MATCHES));
private String format = "";

@Config("remote-query.comment.format")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that remote-query is better, as it tells exactly which query we are talking about in this case. comment-format might be a bit closer to what we do in other places. so remote-query.comment-format?


@Config("remote-query.comment.format")
@ConfigDescription("Format in which logs about query execution context should be added as comments sent through jdbc driver." +
" It can consist only of letters, digits, spaces, underscores, equality signs and predefined values: $QUERY_ID, $SOURCE, $USER, $TRACE_TOKEN")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.


private static RemoteQueryModifier createRemoteQueryModifier(String commentFormat)
{
return new RemoteQueryModifier(new RemoteQueryModifierConfig().setFormat(commentFormat));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could and I could make it package-private to not be used outside of this package.

Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could have test for default behaviour. Overall design looks good to me

return this;
}

public String getFormat()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a NotNull annotation ?

for (PredefinedValue predefinedValue : PredefinedValue.values()) {
message = message.replaceAll(predefinedValue.getMatchCase(), predefinedValue.value(session));
}
return query + " /*" + message + "*/";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default - we shouldn’t add any conext to the query. Won’t this add a /* NOT USED*/ for all queries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it would not in previous implementation and after the change it won't be an issue as well.

}

@Test
public void testFormatWithEmptyValues()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have an additional test case - for default behaviour (if format is not specified)

.initialize()
.getInstance(RemoteQueryModifier.class);

assertThat(remoteQueryModifier)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of asserting remoteQueryModifier - can we assert its behaviour.

return distributedQueryRunner;
}

@Test(singleThreaded = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there is any specific reason on why we didn’t annotate the TestClass with this ?

@s2lomon s2lomon force-pushed the comment_query_appenders branch 2 times, most recently from ef49450 to a86828e Compare November 17, 2022 16:47
Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor change. LGTM

}

@Inject
public DefaultQueryBuilder(RemoteQueryModifier contextAppender)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : Can rename the variable as queryModifier and in the rnn message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed that last time.

return server.getRemoteDatabaseEvents()
.stream()
.skip(startingFromEventsCount)
.limit(lastEventsCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the difference between the starting and lastEventCount ?
BTW Do we need this bottom limit, since the test are single threaded and we already have a starting point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a request from @kokosing to make it immutable. Thanks to this it is, as every method will always return same events. We are not using this fact at this moment in our tests, but it's still better to have it as such and don't be forced to think about it in the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, should limit be the difference between the startingEvent and lastEvents or should we re-order the streams ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I believe that this is correct, as skip doesn't truncate the stream, it just skips first few and limit limits all values, but to be honest, I would prefer to have a test for that to be 100% sure, and we don't have that here. Due to that, I've decided to rewrite this part quickly, to not confuse anyone and to have immutability down to the values identity.

@s2lomon s2lomon force-pushed the comment_query_appenders branch from a86828e to 52dc6f9 Compare November 21, 2022 12:17
@s2lomon
Copy link
Member Author

s2lomon commented Nov 21, 2022

I've added also a suite of tests checking for correct default behaviour - meaning that nothing should be logged at all when it's disabled. I've noticed that I have only partially tested that in guice setup tests.

@s2lomon s2lomon force-pushed the comment_query_appenders branch from 52dc6f9 to 91dc517 Compare November 21, 2022 21:59
@Praveen2112
Copy link
Member

CI is red. Can you please take a look.

@s2lomon s2lomon force-pushed the comment_query_appenders branch 2 times, most recently from 8303ff2 to 9ed7df5 Compare November 23, 2022 10:02
This change allows for sending additional logs, that are
send in comments to external systems (jdbc only atm). It allows
to pass additional security and observability information to these
systems, that wouldn't be possible to send otherwie.

This implementation covers SELECT, DDL and DML queries, although it
does not cover any metadata queries - as these are made by the drivers
itself and we can't easily intercept these.
@s2lomon s2lomon force-pushed the comment_query_appenders branch from 9ed7df5 to e2771a8 Compare November 23, 2022 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants