Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Successfully executed cells get stuck in busy state until notebook finishes its run #4334

Closed
stas00 opened this issue Jan 10, 2019 · 14 comments · Fixed by ipython/ipykernel#390

Comments

@stas00
Copy link

stas00 commented Jan 10, 2019

I have been experiencing a strange problem. With Run All the cells run and clear the busy status just fine and then at some cell down the road successfully executed cells start showing as busy [*]. It seems to appear in random places in different notebooks, usually in the exact same code cell of a given notebook, regardless of the content. Here is a snapshot from 'Run All':

screenshot_8

I succeeded at creating a simple test case (had to rename it to .txt to be able to upload, please rename it to .ipynb to try).
stuck.txt

You can reproduce this problem by running it live in one click here:
https://mybinder.org/v2/gh/stas00/fastai-misc/master?filepath=debug%2Fjupyter%2Fstuck.ipynb

If I save the notebook while it's in this state, execution_count for those cells appears as null. If I let the notebook finish and then save them, the execution_count becomes set to non-null.

The same problem happens in chrome and firefox. I tried different python versions, downgrading all the different jupyter components, disabling extensions, config and what not, it is still there.

Here is the debug output run at which the snapshot was taken.

JUPYTER_NO_CONFIG=1 jupyter notebook --log-level=DEBUG

[D 16:57:37.514 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.515 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: execute_input
[D 16:57:37.516 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.518 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.518 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: execute_input
[D 16:57:37.557 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.557 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.558 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: execute_input
[D 16:57:37.596 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: stream
[D 16:57:37.598 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.600 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.601 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: execute_input
[D 16:57:37.637 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: stream
[D 16:57:37.638 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.639 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.640 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: execute_input
[D 16:57:37.678 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: stream
[D 16:57:37.679 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.680 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.681 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: execute_input
[D 16:57:37.724 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: stream
[D 16:57:37.733 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.737 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:37.741 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: execute_input
[D 16:57:38.779 NotebookApp] 304 GET /static/base/images/favicon-busy-1.ico (127.0.0.1) 2.35ms
[D 16:57:57.778 NotebookApp] activity on fe2c21b6-7fc8-43f2-923d-a7307bf7eded: status
[D 16:57:57.793 NotebookApp] 304 GET /static/base/images/favicon-notebook.ico (127.0.0.1) 2.56ms

Perhaps it has something to do with this switch to busy state as can be seen from the debug log? The initial cells clear out fast enough, so it doesn't switch to busy. Yet, once a slow code cell is encountered it looks like it stops updating the status of the cells, even the ones above it, that it didn't have a chance to clear yet. It remains like this for the rest of the notebook, regardless of its length.

Any other things I can try to find out the culprit?

I asked a friend and he said he also has encountered this issue.

Versions:
Ubuntu-18.04
notebook server is: 5.7.4
Python 3.7.1
IPython 7.2.0

@stas00
Copy link
Author

stas00 commented Jan 16, 2019

Is it possible to receive some guidance at how to debug this problem?

I surely understand you can't fix it if you can't reproduce it, but perhaps you have some insights at where to look for the cause.

This problem makes the notebook environment very problematic to use as it screws up all the navigation in the notebook.

Thank you.

@stas00
Copy link
Author

stas00 commented Jan 18, 2019

I gave it another go and went even further back in time until I found a working scenario and then proceeded forward until I hit the problem, and here is what I found:

Last working (no busy cells): python-3.5.6

conda create -y python=3.5.6 --name work35
conda activate work35
conda install -y ipython==6.5.0 ipykernel=4.10.0 notebook

First not working (busy cells): python-3.6.0

conda create -y python=3.6.0 --name work36
conda activate work36
conda install -y ipython==6.5.0 ipykernel=4.10.0 notebook

I compared identical setups, hence the restrictions on version numbers, since py35 has no conda packages for latest versions. The resulting environments have slightly varying packages as not all dependencies had the same versions built for both python versions.

I also tried to build the whole env with pip instead of conda with the same results.

So, if I did everything correctly it indicates that some change in python-3.6.0 introduced this problem. There is a huge list of changes, I won't know what to look for.

Do note that I first tested with python27 and had no problems there either.

So I'm happy I found that some earlier python versions did work just fine and that the problem doesn't have anything to do with extensions, custom config or browser versions.

Unfortunately, I need at least python-3.6 to work. So the working 3.5.6 setup isn't helpful :(

Any suggestions on how to proceed?

@stas00
Copy link
Author

stas00 commented Jan 20, 2019

Oh great, I found I can share a notebook online and you can run it and see the problem for yourself!

Please run:
https://mybinder.org/v2/gh/stas00/fastai-misc/master?filepath=debug%2Fjupyter%2Fstuck.ipynb

Luckily the problem appears there too.

So now you can clearly see that it's a bug in the notebook project and it's unrelated to the user's setup.

I updated the first post to include the link to the live notebook.

@stas00
Copy link
Author

stas00 commented Feb 4, 2019

I further debugged this by instrumenting notebook/static/services/kernels/kernel.js, with:

  Kernel.prototype._finish_ws_message = function (msg) {
    console.log("_finish_ws_message", msg)

and basically what happens is that execute_reply "shell" messages stop arriving as soon as some forward running slow cell starts executing, so the cells that missed their opportunity to receive execute_reply will not get that opportunity until all other cells complete. Which means that no cells from that moment on will receive execute_reply until the notebook is complete, and then they all get this message in a row.

Most likely the flow needs to be changed to await execute_reply from the currently running cell, before starting a new one - if at all possible.

FWIW, "iopub" channel receives execution_state: "idle" in the right timing, but it's the _handle_execute_reply in codecell.js that clears the busy state:

this.set_input_prompt(msg.content.execution_count);

and it's configured to run upon execute_reply msg on the "shell" channel.

@ginward
Copy link

ginward commented Feb 4, 2019

Same problem here. Only occurs after upgrading to python3 (no such problem in python 2).


pyzmq                     17.1.2
ipykernel                 5.1.0 
notebook                  5.7.4 
jupyter                   1.0.0                    
jupyter_client            5.2.4                   
jupyter_console           6.0.0                 
jupyter_core              4.4.0                    
jupyterlab                0.35.3                     
jupyterlab_server         0.2.0                    
python                    3.7.1
pandas                    0.23.4

It seems to happen when I use pandas to read large csv files.

@noklam
Copy link

noklam commented Feb 17, 2019

Oh great, I found I can share a notebook online and you can run it and see the problem for yourself!

Please run:
https://mybinder.org/v2/gh/stas00/fastai-misc/master?filepath=debug%2Fjupyter%2Fstuck.ipynb

Luckily the problem appears there too.

So now you can clearly see that it's a bug in the notebook project and it's unrelated to the user's setup.

I updated the first post to include the link to the live notebook.

Thanks @stas00, I am able to reproduce the same behavior, it almost always stuck in the same cell, although 1 out of 20~30 trial it behave normally.

@pkampofo
Copy link

Any fix found yet? I'm encountering the same issue.

@noklam
Copy link

noklam commented Feb 17, 2019

@stas00 found the issue appears after python 3.6 is quite interesting, I would expect the issue is coming from update of jupyter.

I have done some more testing on a Window 10 Machine and find that the issue is only exist in python 3.6.
I have test the same notebook with

  • python 3.5.5
  • python 3.6.0
  • python 3.7.0
# py35
conda create -n py35
conda install python==3.5.5 jupyter
# py36
conda create -n py36
conda install python==3.6.0 jupyter
# py37
conda create -n py37
conda install python==3.7.0 jupyter
Environment Result
py35 Normal
py36 Stuck
py37 Normal

@noklam
Copy link

noklam commented Feb 17, 2019

Unfortunately after I mistakenly install a different version of python in my base (was python3.7.0), I start to get the same bug........ but I am pretty sure I had a python 3.7.0 running smoothly before. At first I was thinking this maybe a Window vs Linux issue, but now I couldn't reproduce an working environment in 3.7.0

@noklam
Copy link

noklam commented Feb 17, 2019

@stas00 Ok, so I have made some progress, this confirm my initial finding that python>3.6.0 can works too. Here is what I have done to create an working environment with python 3.6.0.

Here is my exact steps to create such an environment

  1. conda create -n py35
  2. conda install jupyter python==3.5.5
  3. pip install jupyter jupyter_client ipython --upgrade
  4. conda install python==3.6.0

I am guessing step 3 is not that related, as I do step 1-3 and still getting the same bug, only after I do step 4 I get a working environment. I then do conda intsall python==3.7.0 in the same environment and it works as well! @pkampofo not sure if it counts as a solution, the root cause is unknown.



# packages in environment at C:\Users\lrcno\AppData\Local\Continuum\anaconda3\envs\py35:
#
# Name                    Version                   Build  Channel
backcall                  0.1.0                    py36_0
bleach                    3.1.0                    py36_0
ca-certificates           2019.1.23                     0
certifi                   2018.11.29               py36_0
colorama                  0.4.1                    py36_0
decorator                 4.3.2                    py36_0
entrypoints               0.3                      py36_0
html5lib                  1.0.1                    py36_0
icu                       58.2                 ha66f8fd_1
ipykernel                 4.10.0                   py35_0
ipython                   7.2.0                    pypi_0    pypi
ipython_genutils          0.2.0            py36h3c5d0ee_0
ipywidgets                7.4.1                    py35_0
jedi                      0.13.2                   py36_0
jinja2                    2.10                     py36_0
jpeg                      9b                   hb83a4c4_2
jsonschema                2.6.0            py36h7636477_0
jupyter                   1.0.0                    py35_7
jupyter-client            5.2.4                    pypi_0    pypi
jupyter_console           5.2.0                    py35_1
jupyter_core              4.4.0                    py36_0
libpng                    1.6.34               h79bbb47_0
libsodium                 1.0.16               h9d3ae62_0
m2w64-gcc-libgfortran     5.3.0                         6
m2w64-gcc-libs            5.3.0                         7
m2w64-gcc-libs-core       5.3.0                         7
m2w64-gmp                 6.1.0                         2
m2w64-libwinpthread-git   5.0.0.4634.697f757               2
markupsafe                1.0              py36hfa6e2cd_1
mistune                   0.8.3            py36hfa6e2cd_1
msys2-conda-epoch         20160418                      1
nbconvert                 5.3.1                    py35_0
nbformat                  4.4.0            py36h3a5bc1b_0
notebook                  5.5.0                    py35_0
openssl                   1.0.2p               hfa6e2cd_0
pandoc                    2.2.3.2                       0
pandocfilters             1.4.2                    py36_1
parso                     0.3.2                    py36_0
pickleshare               0.7.5                    py36_0
pip                       19.0.1                   py36_0
prompt-toolkit            2.0.8                    pypi_0    pypi
pygments                  2.3.1                    py36_0
pyqt                      5.9.2            py36h6538335_2
python                    3.6.0                         0
python-dateutil           2.7.5                    py36_0
pywinpty                  0.5.5                 py36_1000
pyzmq                     17.1.2           py36hfa6e2cd_0
qt                        5.9.6            vc14h62aca36_0
qtconsole                 4.4.1                    py35_0
send2trash                1.5.0                    py36_0
setuptools                40.7.3                   py36_0
simplegeneric             0.8.1                    py36_2
sip                       4.19.8           py36h6538335_0
six                       1.12.0                   py36_0
sqlite                    3.25.2               hfa6e2cd_0
terminado                 0.8.1                    py36_1
testpath                  0.4.2                    py36_0
tornado                   5.1.1            py36hfa6e2cd_0
traitlets                 4.3.2            py36h096827d_0
vc                        14                   h0510ff6_3
vs2015_runtime            14.15.26706          h3a45250_0
wcwidth                   0.1.7            py36h3d5aa90_0
webencodings              0.5.1                    py36_1
wheel                     0.32.3                   py36_0
widgetsnbextension        3.4.1                    py35_0
win_unicode_console       0.5              py36hcdbd4b5_0
wincertstore              0.2              py36h7fe50ca_0
winpty                    0.4.3                         4
zeromq                    4.2.5                he025d50_1
zlib                      1.2.11               h8395fce_2

@stas00
Copy link
Author

stas00 commented Feb 17, 2019

yay, thank you for digging more into it, @noklam . Following your steps I did some more bisecting and now found the real culprit. It's ipykernel=5.0.0 that breaks it. Can you please double check?

this works:

conda create -y python=3.7.1 --name work37
conda activate work37
conda install -y notebook ipykernel=4.10.0

this breaks:

conda create -y python=3.7.1 --name work37-maybe
conda activate work37-maybe
conda install -y notebook ipykernel=5.0.0

Difference between envs:

conda-env-compare.pl work37 work37-maybe
Comparing installed packages in environments: work37 and work37-maybe
environment       work37 work37-maybe                  work37            work37-maybe
package name      version      version                  source                  source
-------------------------------------------------------------------------------------------
ipykernel       4.10.0        5.0.0         py37_0/anaconda py37h39e3cac_0/anaconda

I also updated my normal environment to force downgrade to ipykernel=4.10.0 and it no longer gets stuck - yay!

But, of course, ipykernel still needs to be fixed.

@stas00
Copy link
Author

stas00 commented Feb 17, 2019

further bisected into ipykernel and found the commit that broke it: ipython/ipykernel@3b1f632

3b1f632528e3577539e96340f9d751723bf5c7cc is the first bad commit
commit 3b1f632528e3577539e96340f9d751723bf5c7cc
Author: Min RK <benjaminrk@gmail.com>
Date:   Wed Sep 5 13:53:27 2018 +0200

    use PriorityQueue for messages

    ensures control messages maintain priority

:040000 040000 28a56cc56df07170b7a939c5fad721f116c06af0 3f56c2b17276897e406107919ae57d7d9b3f7a3f M      ipykernel

Opened an Issue ipython/ipykernel#389
Since there is so much debug info here, I will keep this issue open for now and will close it once we have a resolution at ipykernel.

Thank you all who helped to find the culprit and your support!

@noklam
Copy link

noklam commented Feb 18, 2019

@stas00 I can confirm when I switch from ipykernel==4.10.0 -> 5.0.0 the bug starts to appear. Except that I don't have a nice conda-tools like you and I don't know a good way to show diff file in Window (Normally I just check them in VS code, I cannot find a way to output the diff).

So I turn into a web diff tools instead:
https://www.diffchecker.com/AxgRYVqv

@lmsanch
Copy link

lmsanch commented May 15, 2019

I am having a similar issue, and keep having the problem with different versions of python 3.6, 3.7 and ipykernel. I don't think just downgrading to ipykernel=4.10.0 solves the issue.
The latest env I tried is python 3.7 and ipkernel 4.10.0 (downgrade from 5.1.0.) Does not work.

Screen Shot 2019-05-15 at 11 29 21 AM

The behavior is bizarre: The jupyter notebook does all the calculations and displays them very fast, but the kernel shows busy, even after doing all the calculations and displaying them correctly. Then, if I do a new cell, or update a value in a previously calculated cell and recalc, the * (kernel busy) stays there until:

  1. waiting for several seconds OR
  2. pressing stop and recalc.

If I do "stop" and recalc, the calculation is fast and displayed promptly as usual, but the kernel remains "busy" for a while. So it looks to me that there is a buffer of "junk" somewhere that is not allowing calculations on new cells until the "junk" is processed.

The video below (zippped and youtube link) better illustrates the issue.

I have not tried a python-3.5.6 with ipkernel 4.10.0 as @noklam and @stas00 suggested.

demo_error.mp4.zip
https://youtu.be/nRHP3QDVGdE

update
installed python 3.5 with ipkernel 4.10.0; my jupyter notebooks crashes when selecting the 3.5 kernel.
Screen Shot 2019-05-15 at 12 32 24 PM

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants