-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic: runtime error: slice bounds out of range [:-6] after upgrade v1.0 #752
Comments
This sounds like a corrupt wal. Please remove It's possible it's just a one off corruption or due to a bug in Tempo. We are not seeing this, but I will leave this open in case it persists (or others have similar problems). |
Thanks @joe-elliott, it fixed the issue |
I got the same issue this morning. We stop tempo pod during night and starts it early the morning. Maybe there is a clean issue during stop ? |
How are you stopping the pod? If sending a SIGTERM and letting it close on its own it should not do this. If SIGKILL or maybe SIGTERM and then SIGKILL after a timeout then it makes sense you'd be seeing something like this. |
Pod is stopped as we reduce pool size during night, so node running tempo is drained (SchedulingDisabled then removed). I guess in that case SIGTERM closing is called by K8S because of node drain. Maybe termination is too long and K8S then call SIGKILL. |
I don't know for certain, but I believe drains follow all of the standard rules for shutting down a pod (including terminationGrace period). We regularly see ingesters get terminated and have not seen this issue. Things you can try:
|
I'm having the same issue running Tempo on Docker for Windows after restarting the PC. |
@alexander-klang @guyguy333 Is the number always |
Also, if anyone could share the shutdown logs of the ingester before this issue is seen that would be helpful. |
We have confirmation from @Whyeasy that I will submit a PR with guard code that prevents the panic and replays the wal up to the point of corruption and we can cut a release with that. |
@joe-elliott I confirm I got -6 each time I got the issue (I didn't reproduce anymore). |
This has been released as 1.0.1: |
Describe the bug
I upgraded tempo to v1.0 (thanks for the release, amazing product 👍). First boot after upgrade was fine. Now I'm having this error en restart and I can no longer start tempo:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
No crash
Environment:
Additional Context
I increased
grpc_server_max_recv_msg_size: 41943040
on server config, not sure it's related.The text was updated successfully, but these errors were encountered: