-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC fails with "invalid checksum digest format" from registry when deleting manifests #15970
Comments
@wy65701436 came across this issue (distribution/distribution#3018) and PR (distribution/distribution#3019) in the upstream Docker Distribution that seem to have identified the same issue that is causing these errors we're seeing. AFAICT the PR was abandoned and the issue is still open with now resolution, GitLab seem to have forked distribution and started using their own version due to inability to push changes. With distribution still (?) using Google SDK from 2015, it's really worrying and looks like it's contributing to a lot of the issues we're seeing, including this one? |
thanks @dkulchinsky , I'll update the Google SDK for upstream distribution. Given this, Harbor still cannot leverage it till we get an new distribution release. |
Thanks @wy65701436! can we consider using gitlab's fork of the distribution? seems like it's in a much better shape in terms of reliability, perfromance and overall maintenance, Docker's distribution last release was in January 2019 😱 Is there anything you can suggest for my situation? we already have over 10,000 artifacts waiting for GC (and the number grows daily) and the GC job keeps failing either due to this issue or the other GC issues I've reported (mostly #15822) What can we do? getting really desperate with this 😞 |
We have no plan to leverage other forked distribution. We're(distribution maintainers) working on issue new release for upstream distribution, but I cannot give an date.
|
Thanks again for replying @wy65701436! I appreciate it 👍🏼
Looking forward to it 👍🏼
I think allowing GC to skip blobs/manifests that fail to be removed due to persistent errors such as 404 & 500 can help mitigate this issue considerably, at least allow us to GC the majority of artifacts. perhaps this behaviour can be an optional configuration, so that it won't be considered as a breaking change and will be opt-in. |
yes, to allow skip failure could be an option. BTW, for the performance issue, we maybe will do some enhancement on distribution side. |
That would be great @wy65701436 🤝 this will be greatly appreciated as it would really help our current situation, I hope this can be implemented sooner than later 🙏🏼
👏🏼 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
this is still being tracked I believe? |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
still relevant |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
still relevant |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
not stale |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
not stale |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
not stale |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
not stale |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
not stale |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
not stale |
This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days. |
This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue. |
Looks like I missed the his is definitely still an issue and should be tracked. @wy65701436 can you please re-open? |
If you are reporting a problem, please make sure the following information are provided:
Expected behavior and actual behavior:
GC should delete all manifests & blobs marked for removal and if an error is encountered should skip the offending manifest/blob and continue with the rest, logging the issue, instead GC fails to delete some manifest because registry returns 500 and the following error message:
and the whole GC job fails and stops.
Steps to reproduce the problem:
Don't know what causes it, so not sure how to reproduce.
Versions:
Please specify the versions of following systems.
Additional context:
Registry logs
jobservice log:
The text was updated successfully, but these errors were encountered: