-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Estimates become extremely large if progress updates are infrequent #556
Comments
@afontenot would be great if you have any ideas how to avoid this. |
Sure, this was something that came up in the development of the new algorithm. I had initially planned to make the behavior around this configurable in two ways (which I'll describe below), but we ended up deciding to leave it out in favor of having good defaults. The issue here is that given the assumptions made by the algorithm, a very large ETA is entirely reasonable if no progress has occurred in e.g. 2 minutes. The weighting of the exponential function is such that the most recent 15 seconds provide most (but not all) the data in the average. The reason for this is that it's designed to be reactive on time scales that matter to a person continually watching progress - for example, on a file transfer. It's not tuned for generating good estimates for long, intermittent activities. On a technical level, this is the result of two decisions:
Of these two, I'd say the first is most directly implicated here. Even if you implemented the second feature, you'd see annoying jumps in the estimate with a progress stall of 10 minutes. The exponential smoothing that the algorithm is designed to provide would have basically no effect because the time scale is much too small. I think it would not be unreasonable to try to make this configurable. Everything should just work if you set the value to 20 minutes or even higher. (With very high settings, there's not much down-weighting of older data, so you get behavior approximating a linear average since the beginning of progress, which is often appropriate for these "predictable intermittent stall" cases.) |
Thanks, that would be helpful for us. We expect progress every 75 seconds for one of our progress bars, and every 10 seconds - 3 minutes for the other. |
Requiring configuration for this kind of thing seems like an anti-pattern to me: requiring users to give us information that they then have to benchmark and keep up to date, when it feels like there is some algorithm we could use to avoid the current edge case behavior. Can we, for example, define some boundary where we switch to different tuning parameters? |
I agree.
Can we dynamically change the weighting based on the average/median time between the most recent N progress updates? If needed, we could exclude the last 1-2 updates, because they might represent a disconnection or other instability. (A median would do this automatically.) This would work for us, because each of our progress bars has two different modes:
|
@afontenot would you be able to spend more time on this? If not, that's fine too, I can dig into it more. |
I would highly appreciate it if there would be an option to turn off the exponential weighting at all. I guess if the decay rate is configurable, one could set it to a very high value as you have mentioned, but I fear that I would have to set them high enough that putting the number of seconds into an exponential could cause problems. I have programs that run up to a few days, with steps sometimes taking hours. The steps are very consistent in length, so the exponential weighting provides no benefit at all. Also, there is no way to further subdivide the steps as the most time is spent in one call to lapack. Without the steady tick, the elapsed time does not get updated often enough, e.g. there is no way to see, how long the program is running for before the first step is completed. |
We're using
indicatif
viahowudoin
, to display events that update every few minutes. Sometimes there can be delays of up to 10 minutes.We're seeing extremely large estimates when there aren't any events for a few minutes.
This is the underlying cause of the panics in #554 in our application. There aren't any updates for a few minutes, so the estimate becomes billions of years. Eventually, it is outside the range of
Duration
, which panics.Is it possible to make
EXPONENTIAL_WEIGHTING_SECONDS
configurable, or use an algorithm that doesn't have this exponentially increasing behaviour when there aren't any updates?(I have read the discussion in #394 and related tickets.)
Here's an example of the beginning of an exponential increase:
The text was updated successfully, but these errors were encountered: