-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Registry cache warmer stops logging #1940
Comments
I have a theory that ACR might (automatically) null route you after repeatedly hitting their You restarting the pod probably meant Flux received a new outbound IP address with no null route attached to it. |
Nothing, not even "refreshing image" logs? If that's the case I doubt it's the lack of timeout logs, because we only omit the "refreshing image" logs when we don't find images to update. My hunch is that, somehow, |
Upon further reading, this seems highly unlikely |
Maybe related? #1808 |
@brantb have you seen this again recently? |
I've just seen that the image repository client we use:
So, I think that as soon as a request hangs, the warmer will get stuck. I believe that's what's happening both here and #1808 . I bet it only happens with ACR because other registry providers don't leave the connection open. It may very well be a combination with rate limiting. There are no new library updates ( |
Not that we do anything with the context, other than set it to |
@2opremio I maybe have a similar issue but not due to rate limiting. At least I can't see anything related to that in the logs. At some point warmer just disappears from logs. After that new images aren't refreshed/pulled. Last lines of logs before warmer went dark. It was in the middle of the night...
|
Another update.
|
I think I may have fixed the problem in #1970 @alexanderbuhler / @brantb Would you be able to confirm by running |
@2opremio great work, thanks. I applied your image and will check back after some monitoring. |
Perfect, thanks a lot! |
Reopening until we get confirmation of #1970 fixing the problem |
@2opremio I already saw a new Flux release containing this fix but I'm confirming the warmer still works with our setup after >24h hours. Think we can close this. Thanks again! |
GreatI! Just to be sure, it always took less than 24 hours for the problem to happen? |
@2opremio Not always, but mostly under 24 hours. Now after almost two days warmer's still working. |
Great, closing. Let me know if it happens again. |
We observed a scenario where the registry cache warmer stopped functioning and Flux would no longer automatically detect or apply updated tags.
We seem to be hitting Azure Container Registry's rate limit fairly hard, with a lot of rate limit errors being logged. However, at some point, the warmer component stopped logging entirely.
The relevant part of the log output is below:
A discussion in #flux revealed that the memcached cache is full due to the large number of tags being scanned, which might be contributing to the issue. (Possibly related: #1939)
The text was updated successfully, but these errors were encountered: