-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JupyterHub terminates with 500 Internal Server error #433
Comments
Perhaps a problem with the Nvidia LD search paths on your gpu box? What does |
It was working fine yesterday. Nothing really changed. |
This is a consistent failure. I'm not sure what has changed upstream, but the attached log is showing: jupyterhub-sing[75273] trap invalid opcode ip:7f1493653040 near the bottom. It successfully mounts the Ceph PV I created and then crashes. |
I also posted jupyterhub/jupyterhub#1727 assuming it's a Jupyterhub issue. |
I used the jupyter/tensorflow-notebook image and everything worked fine. |
Can we close this issue? Can you clarify whether it was JupyterHub or the Jupyter notebook that was crashing? The title says its JupyterHub that's crashing but the details in the bug indicate problems with the PV creation/mounting. But I think its the notebook server not the hub that mounts the PV. |
My apologies for the divergent streams of thought while I have been troubleshooting this issue. I have mostly resolved my issues with default storage class and Jupyterhub now spins up a PVC which successfully mounts a PV. The 500 server error is being caused by the images gcr.io/kubeflow-images-staging/tensorflow-notebook-cpu and gcr.io/kubeflow-images-staging/tensorflow-notebook-gpu. I thought that the storage may have been the culprit since I was able to get those images to start a notebook only if I manually created a PV. The problem seems to be intermittent and sometimes I can get the images to start a notebook, but in all cases the notebooks are unusable since I can't run any code which imports TF or Matplotlib. I used the jupyter/tensorflow-notebook image and it starts every time and I can run TF code effortlessly. I'm still not sure however if the root cause are the images or Jupyterhub, or if there are underlying storage issues that manifest themselves when using the gcr.io images. |
I'm going to close this since I've confirmed that the images noted do not work properly. I am working now with an older notebook image |
I'm running a MAAS bare metal deployment. I have K8s deployed with Ceph as my Persistent storage.
I've been dealing with Jupyterhub issues where PV's are not created by the PVC. As of yesterday I was able to manually create a PV and be able to spawn a Jupyter session successfully.
As of this morning it no longer works. It successfully mounts the rbd volume and then dies shortly thereafter. I've attached a tail of the syslog.
syslog.txt
The text was updated successfully, but these errors were encountered: