Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authservice pod "Failed to save state in store: error trying to save session: input/output error" #112

Open
psheorangithub opened this issue Mar 15, 2023 · 4 comments

Comments

@psheorangithub
Copy link

We have integrated kubeflow with OIDC flow(Heracles+LDAP)We are unable to login to kubeflow UI. GUI throws below error.

Access to kubeflow.aiwb-enc-data-cpu1.uscentral-prd-az3.k8s.int was deniedYou don't have authorization to view this page.
HTTP ERROR 403

While checking the authservice pod logs, I see below error. It happens every couple of days.

2023/03/15 14:08:40 boltstore: remove expired sessions error: input/output error
time="2023-03-15T14:06:29Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip= request=/

The issue resolves after restarting authservice pod but it re-appear after every 10-15 days. We have checked the underlying PVC status, it looks healthy.

Can someone look into it and suggest what could be the cause?

@edwardzjl
Copy link
Contributor

Are you deploying kubeflow with kubeflow-manifests? Have you checked the disk usage of the volume? The default volume is 10Gi, without further information I can only guess maybe after 10-15 days the disk is full?

@psheorangithub
Copy link
Author

Yes, I have deployed using kubeflow manifest. To be specific, https://github.com/kubeflow/manifests/tree/v1.6.0.
Yes, i did check the disk usage already. The PVC used by authservice is 10G and only file in that is "data-db" which was only 5 MB a the time of issue. Overall volumes usage also looks good.

@MiraiChino
Copy link

We are experiencing the same issue in our environment as well. The "Failed to Save State in Store: Input/Output Error" error keeps showing up for the authservice pod, even though all other components seem to be running fine. Upon launching the Kubeflow environment and adding five users, we have encountered a recurring 403 error on the Kubeflow Dex login page, even when no users were logged in.

Environment:

Pod Information:

  • Pod Name: authservice-0
  • Namespace: istio-system
  • Container Image: gcr.io/arrikto/kubeflow/oidc-authservice:28c59ef

Issue Details:

  • The authservice-0 pod within the istio-system namespace shows no anomalies in resource usage. CPU and memory consumption appear to be normal.
# kubectl top pod authservice-0 -n istio-system
NAME            CPU(cores)   MEMORY(bytes)   
authservice-0   1m           3Mi      
  • The associated persistent volume claim (PVC) has the expected data in the NFS storage, and the data.db file appears to be intact.
# ls -lh /export/kubernetes/istio-system-authservice-pvc-pvc-3e8dd897-4478-40c5-a007-e1d1aa55f734
total 24K
-rw-r--r-- 1 systemd-network tss 32K Jul 24 05:25 data.db

Error Logs:

# kubectl logs authservice-0 -n istio-system
time="2023-07-24T05:25:20Z" level=info msg="Starting readiness probe at 8081"
time="2023-07-24T05:25:20Z" level=info msg="No  USERID_TOKEN_HEADER  specified, using 'kubeflow-userid-token' as default."
time="2023-07-24T05:25:20Z" level=info msg="No  SERVER_HOSTNAME  specified, using '' as default."
time="2023-07-24T05:25:20Z" level=info msg="No  SERVER_PORT  specified, using '8080' as default."
time="2023-07-24T05:25:20Z" level=info msg="No  SESSION_MAX_AGE  specified, using '86400' as default."
time="2023-07-24T05:25:20Z" level=info msg="Starting web server at :8080"
time="2023-07-24T05:47:51Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/
time="2023-07-24T05:48:29Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/
time="2023-07-24T05:50:10Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/
time="2023-07-24T05:50:25Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/
time="2023-07-24T05:55:03Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/
time="2023-07-24T05:55:04Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.145 request=/

@MiraiChino
Copy link

Additional Content:
After setting the log level of oidc-authservice to DEBUG, I rechecked the logs when the error occurred again. I discovered that the error is related to boltstore/reaper, which is responsible for releasing unnecessary resources, rather than using boltdb for session management.

2023/08/01 03:22:10 boltstore: remove expired sessions error: input/output error
time="2023-08-01T03:22:57Z" level=warning msg="Request doesn't have a valid session." ip=192.168.200.15 request=/logout
time="2023-08-01T03:22:57Z" level=error msg="Failed to save state in store: error trying to save session: input/output error" ip=192.168.200.15 request=/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants