Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download loop when resume_downloads and missing bundle #158

Open
ashlin4010 opened this issue Jun 29, 2023 · 1 comment
Open

Download loop when resume_downloads and missing bundle #158

ashlin4010 opened this issue Jun 29, 2023 · 1 comment

Comments

@ashlin4010
Copy link

If for some reason the bundle from Hawkbit is no longer available eg returns 404 rauc-hawkbit-updater will try re-download the files forever. Forcing the rollout to stop via Hawkbit does not stop this behavior, the only way is to stop and restart the rauc-hawkbit-updater service.

I believe that resume_downloads should only resume if there is any hope for a successful download. HTTP status codes 4xx and 5xx should be aborted and report a failure back to Hawkbit.

I believe that force-stopping an update should also stop resume_downloads. The addition of a limit may also be wise.

Our hope was that resume_downloads would allow us to reduce data usage in the event of a data outage mid-download. However, the risk of getting stuck in an endless loop of trying to download is too high and has the possibility to use an unlimited amount of data.
There is also the risk of a self-inflicted DDoS attack, a fleet of systems continually trying to download a nonexistent file forever without any way to stop them would not be not ideal.

I encountered this problem while setting up a test production environment and had some reverse proxies misconfigured and Hawkbit was doing dumb things. However, an infrastructure outage may also have this effect. For example, a Hawkbit container may fail but any load balancer may keep going sending 503 error codes.

@Bastian-Krause
Copy link
Member

Bastian-Krause commented Jul 3, 2023

If for some reason the bundle from Hawkbit is no longer available eg returns 404 rauc-hawkbit-updater will try re-download the files forever. Forcing the rollout to stop via Hawkbit does not stop this behavior, the only way is to stop and restart the rauc-hawkbit-updater service.

I believe that resume_downloads should only resume if there is any hope for a successful download. HTTP status codes 4xx and 5xx should be aborted and report a failure back to Hawkbit.

I agree. I guess this should be easy to implement: after checking for resumable_codes, set resumable to FALSE if the error domain equals RHU_HAWKBIT_CLIENT_HTTP_ERROR and 400 <= error->code < 600. PR welcome.

I believe that force-stopping an update should also stop resume_downloads.

That would require polling some REST endpoint during download. How is this propagated to the DDI API?

The addition of a limit may also be wise.

This should already be covered by low_speed_time and low_speed_rate, right?

Our hope was that resume_downloads would allow us to reduce data usage in the event of a data outage mid-download. However, the risk of getting stuck in an endless loop of trying to download is too high and has the possibility to use an unlimited amount of data. There is also the risk of a self-inflicted DDoS attack, a fleet of systems continually trying to download a nonexistent file forever without any way to stop them would not be not ideal.

I encountered this problem while setting up a test production environment and had some reverse proxies misconfigured and Hawkbit was doing dumb things. However, an infrastructure outage may also have this effect. For example, a Hawkbit container may fail but any load balancer may keep going sending 503 error codes.

Right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants