-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make outputs go to correct cell when generated in threads/asyncio #1186
Make outputs go to correct cell when generated in threads/asyncio #1186
Conversation
5dae96f
to
10c149c
Compare
1ef474c
to
9e9c40e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive PR @krassowski, that's great.
Just to confirm, this wouldn't work with threads that are not created from Python, right? For instance in a C extension.
Yes, writing from threads which were created outside of Python will still write to the most recently executed cell, as will writing directly to OS-level file descriptor corresponding to stdout/stderr. |
ipykernel/ipkernel.py
Outdated
if isinstance(stream, OutStream): | ||
stream._thread_parents[self.ident] = parent_header | ||
|
||
threading.Thread.start = start_closure # type:ignore[method-assign] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if there could be a way to not monkey-patch threading.Thread.start
.
In akernel it's the print
function that is monkey-patched, so this PR goes further because it works for lower-level writes to stdout
/stderr
, but with monkey-patching there is always a risk that another library changes the same object, and we cannot know that our change will be the last applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding as per https://bugs.python.org/issue14073 is that there is no other way, but we could mention our use case as another situation motivating introduction of start/exit hooks/callbacks to threads (which is something Python committers have considered in the past but I presume it was not a sufficiently high priority until now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We crossed 'comments', see my other thread on this exact line for (what I think is) a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
ipykernel/ipkernel.py
Outdated
if isinstance(stream, OutStream): | ||
stream._thread_parents[self.ident] = parent_header | ||
|
||
threading.Thread.start = start_closure # type:ignore[method-assign] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it thread-safe. If a thread starts a new thread, that new thread will start outputting in the last executed cell.
I've taken a different approach in Solara, where I patch the ctor and run from the start:
https://github.com/widgetti/solara/blob/0ac4767a8c5f8c8b221cafe41fa9ac36270adbe8/solara/server/patch.py#L336
I think this approach might work better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize the strategy, the ctor (__init__
) would take a reference to the parent_header, and in run, that parent header is then set in a thread_local storage (in your case _thread_parents
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a thread starts a new thread, that new thread will start outputting in the last executed cell.
Correct (there is no such a problem with asyncio side of things).
I think this approach might work better.
Thank you for the link, I will take a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a closer look at solara, current_context
takes the role of _thread_parents
, right? This does not seem to differ that much different. Your approach of patching earlier on initialization rather than startup has the advantage that it will work with:
from threading import Thread
from time import sleep
def child_target():
for i in range(iterations):
print(i, end='', flush=True)
sleep(interval)
def parent_target():
thread = Thread(target=child_target)
sleep(interval)
thread.start()
Thread(target=parent_target).start()
but still not with:
def parent_target():
sleep(interval)
Thread(target=child_target).start()
Thread(target=parent_target).start()
do I see this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end my implementation converged to overriding the same methods as yours after all in e1258de :)
ipykernel/iostream.py
Outdated
self._parent_header: contextvars.ContextVar[Dict[str, Any]] = contextvars.ContextVar( | ||
"parent_header" | ||
) | ||
self._parent_header.set({}) | ||
self._thread_parents = {} | ||
self._parent_header_global = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why the ContextVar and the _thread_parents dict? Is a threading.local not more standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because contextvar works well for asyncio edge cases. There is some more explanation in the PEP 567, but the gist is:
Thread-local variables are insufficient for asynchronous tasks that execute concurrently in the same OS thread. Any context manager that saves and restores a context value using
threading.local()
will have its context values bleed to other code unexpectedly when used in async/await code.
That said, I will take another look at using threading.local
instead of _thread_parents
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that we need ContextVar now, because also when we create new tasks we want to output to the right output cell, didn't think of that!
I think if we use ContextVar, we do not need the thread local storage, it's a superset of threading.local. In combination with overriding the Thread ctor, we can drop _thread_parents.
34867ac
to
e1258de
Compare
As of now this PR also handles nested threads (started after execution of other cells) as well. |
Hi folks, just bumping this PR as there was no further review/comments in the past three weeks. Any suggestions, or is it good to merge? |
I'll merge tomorrow if there are no further comments. |
References
Code changes
contextvars
to retrieve the parent header of emitting asyncio taskUser facing changes
Backward-incompatible changes
None