-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tenant Index Failures for deleted tenants #2878
Comments
There's a dangerous race condition here where we could be creating a valid block which should not be deleted at the moment we decide to delete the tenant index.
Won't this be fixed by the changes to polling you are currently working on? Where the "list" command starts to list meta.json's and meta.compacted.json's directly? |
One of my hopes for the What the polling PR doesn't solve for is when a tenant should actually be delete. Left over blocks would still mean the tenant path would show up and would still try to poll for that tenant. There may be a small change we can make to the tenant listing to help here, but perhaps outside of the (growing) polling PR. |
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. |
any update on this issue? |
@logamanig Try disabling |
The tenant index deletion was originally put in as TCO win, but did not have the desired effect and surfaced other issues in the system. Related grafana#2678 Related grafana#2754 Related grafana#2781 Related grafana#2878 Related grafana#3115 Related grafana#3223 Due to the number of issues here, and causing considerable noise on the pager, perhaps the right thing to do is back out the tenant deletion. Raising here for discussion.
Describe the bug
A change in #2781 was an attempt to fix this, but is incomplete. Also related to #2754
Currently when Tempo attempts to write a tenant index with 0 blocks and 0 compacted blocks, the tenant index is deleted, and errors to delete the already deleted index are now handled since #2781. However, some deleted tenants may still show up in the tenant list if any objects are left within the tenant object path. This results in the poller attempting to read the tenant index and failing (because the index does not exist), and then falling back to a full poll after incrementing the error counter and resulting in incorrect alerts.
Expected behavior
I viewed the change in #2781 as a stop gap, but since it didn't achieve the goal, I think we should discuss. My preferred approach would be to delete all tenant objects left over when we delete the tenant index.
WriteTenantIndex
)_ is called for a tenant with zero blocks and zero compacted blocksRawWriter.Delete()
is called for the tenant indexList()
at the tenant path to find all remaining objectsThis will result in a full tenant deletion, instead of only deleting the index. Note that even if we address the incorrect metric counting for a deleted tenant, we still make the calls to the backend, so my preference would be to delete all the objects from the backend.
Environment:
Additional Context
#2754
The text was updated successfully, but these errors were encountered: