-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: make PyObject cleanup thread-safe in free-threaded Python and reduce contention #176
fix: make PyObject cleanup thread-safe in free-threaded Python and reduce contention #176
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that we could be corrupting buffer from multiple threads without the synchronization; and I expect the GIL was "saving" us before and is not now - but want to make sure that understanding is accurate.
Also, I want to understand the intention behind the cleanup_on_init change. Perhaps we should add some Javadoc to those flags indicating why they should be set one way or anohter?
Essentially, there needs to be something / somebody responsible for invoking Currently, there are 3 strategies. The original intentions were:
There is this comment in the method // We really really really want to make sure we *already* have the GIL lock at this point in
// time. Otherwise, we block here until the GIL is available for us, and stall all cleanup
// related to our PyObjects. Digging deeper into the existing code, while the code is still "safe", we are breaking the intention of the comment above due to change in #48 (essentially, this PR drops the GIL when c code calls into Java and re-acquires when the java method returns). That is, our assumption that when PyObject is called from JNI it will have the GIL is no longer true, and it's possible we are blocking during the creation of We also have some comments about "This should only be invoked through the proxy" which is no longer true given the ensureGil; the original intention was that the proxy flowed through python to ensure it would be invoked with the GIL, but again, with the dropping of the GIL PR, this is no longer true. As such, I think we may want to re-work how cleanup is done; I think turning CLEANUP_ON_INIT to false or deleting the logic entirely is warranted. That said, we do have some deephaven core testing code that does not work with CLEANUP_ON_THREAD (deephaven/deephaven-core#651), and I suspect we may need to revisit those cases if we change how cleanup works. |
If we remove the CLEANUP_ON_INIT option, does it even make sense to make the thread cleanup optional? The system will leak resources without some cleanup. And the "manual" cleanup does seem to be a dangerous thing to expect people to do. Devin pointed out that some tests don't work with CLEANUP_ON_THREAD, why not? |
Here are some of the comments found in the sources of Deephaven and JPY. return project.tasks.create (taskName.toString(), Test) {
Test t ->
t.group = 'python'
t.description = "Run java tests with jpy setup using python environment $venvFullName at file:$dir"
t.onlyIf { TestTools.shouldRunTests(project) }
// Cleaning up on a dedicated thread has some issues when there is frequent starting
// and stopping of the python virtual environment. We'll rely on cleaning up inline
// when necessary.
t.systemProperty"PyObject.cleanup_on_thread", "false"
// t.environment("JAVA_HOME", Jvm.current().javaHome)
t.dependsOn dependOn
taskTestPython(project).dependsOn t
applyProperties(t)
if (classpath) {
t.classpath = classpath.runtimeClasspath
t.testClassesDirs = classpath.output.classesDirs
}
return
} def test_suite():
suite = unittest.TestSuite()
def test_python_with_java_runtime(self):
assert 0 == test_python_java_rt()
def test_python_with_java_classes(self):
assert 0 == test_python_java_classes()
def test_java(self):
assert test_maven()
suite.addTest(test_python_with_java_runtime)
suite.addTest(test_python_with_java_classes)
# comment out because the asynchronous nature of the PyObject GC in Java makes stopPython/startPython flakey.
# suite.addTest(test_java)
return suite To summarize, the flag is needed to support testing. It is not documented and is true by default so there isn't much danger of user meddling with it to cause memory leaks. |
This reverts commit 5de1fe2.
This is discovered when testing parallelization of https://github.com/deephaven/deephaven-core in a FT Python environment.