-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak when query fails with permission error #12270
Conversation
edfc5c1
to
ade4c5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, can you add a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@picrinite Looks good, thanks for figuring this out.
@elonazoulay Yeah, i think we should add a test. I think what we can try to do, is to create a new integration test class, create a distributed query runner, run different scenarios upon that query runner when checking that transaction manager is empty after every scenario.
queryStateMachine.addStateChangeListener(newState -> { | ||
QUERY_STATE_LOG.debug("Query %s is %s", queryStateMachine.getQueryId(), newState); | ||
// mark finished or failed transaction as inactive | ||
if (newState.isDone()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@picrinite Did you check that this listener is triggered when the analysis fails (the case with permission error)?
The last time we discussed it, we were thinking about wrapping the analysis into a try-catch, because we thought that the state change listener may not always be triggered. Is that not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arhimondr The first commit is an improvement to make it safer for "setInactive" job. It is not triggered for permission error.
The second commit guarantees "setInactive" job is triggered when there is permission error, but there are some tests failed and I don't know how to reproduce. Help appreciated !
Regarding automatic test, cleanup of transaction manager is an asynchronous job with a delay (current default is 5 minutes), it is not deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An integration test is added, the added test will fail without current fix.
presto-main/src/main/java/com/facebook/presto/execution/SqlQueryExecution.java
Outdated
Show resolved
Hide resolved
3efda78
to
bced22d
Compare
d287858
to
28d7fbf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge StateChangeListener for QueryStateMachine
- IIUC this commit just moves the listener registration to
QueryStateMachine
right after the creation of the state machine. If this is correct we can update the commit title asRegister state change listener in QueryStateMachine
or sth similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix memory leak when query fails with permission error
- The commit message talks about permission error only, but the fix is actually a generic fix, so I think we can make the commit title more generic.
@@ -496,6 +489,11 @@ public ThreadPoolExecutorMBean getManagementExecutor() | |||
return queryManagementExecutorMBean; | |||
} | |||
|
|||
public TransactionManager getTransactionManager() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotate with @VisibleForTesting
.
@@ -56,6 +60,30 @@ public void tearDown() | |||
queryRunner = null; | |||
} | |||
|
|||
@Test(timeOut = 100_000L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not 60s? 100s is a bit on the long side.
QueryInfo queryInfo = sqlQueryManager.getFullQueryInfo(queryId); | ||
assertEquals(queryInfo.getState(), FAILED); | ||
assertNotNull(queryInfo.getFailureInfo()); | ||
TimeUnit.MILLISECONDS.sleep(50); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a sleep call here? Does the correctness of the test depend on this call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a future on that metadata cleanup asyn job and wait until it's done? Sleeping is fragile, this test can fail arbitrarily depending on the interleaving of the async job and the thread running this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nezihyigitbasi After further investigation with @arhimondr, we confirmed sleep() is not necessary since cleanup is synchronous. I have removed it from test.
28d7fbf
to
ebfb3d1
Compare
Register transaction cleanup job immediatelly after QueryStateMachine object is created to avoid failure.
ebfb3d1
to
75f7343
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % comment
presto-tests/src/test/java/com/facebook/presto/tests/TestQueryManager.java
Outdated
Show resolved
Hide resolved
presto-tests/src/test/java/com/facebook/presto/tests/TestTransactionManagerIntegration.java
Show resolved
Hide resolved
75f7343
to
2d01d12
Compare
presto-tests/src/test/java/com/facebook/presto/tests/TestTransactionManagerIntegration.java
Show resolved
Hide resolved
When semantic error (including permission error) is thrown during creation of SqlQueryExecution object, transaction metadata is not cleaned up resulting in memory leak. Current handling of error via method TransactionMetadata::fail is not working if there is no existing transaction id in session object.
2d01d12
to
b858007
Compare
Summary: Pull Request resolved: facebookincubator/velox#12270 Differential Revision: D69208299
When permission error is thrown inside creation of QueryExecution object,
transaction metadata is not marked as inactive resulting in memory leak.
Existing handling of error via method TransactionMetadata::fail is not
working when transaction id is empty in session object.
A heapdump is taken to test that previously leaked objects become
unreachable from GC roots after this fix.