Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

Feedback: dep ensure must not fail catastrophically if operations time out #66

Closed
peterbourgon opened this issue Dec 27, 2016 · 2 comments
Labels

Comments

@peterbourgon
Copy link
Contributor

This is feedback from my first run with the tool. It happens very often that github.com will start timing out https operations if I make a bunch of them in a row. And dep ensure makes a bunch of them in a row. When the timeouts start happening, it basically corrupts and kills the entire ensure run, which could invalidate many seconds/minutes worth of work, which is very frustrating.

Just throwing some ideas out: can ensure have some concept of partial, resumable progress, so that if it fails I can pick it up again from that point, rather than having to start from scratch? Failing that, can it make some attempt to throttle and retry failed network operations?

@sdboyer
Copy link
Member

sdboyer commented Dec 29, 2016

Ugh. This one's perturbing.

can ensure have some concept of partial, resumable progress, so that if it fails I can pick it up again from that point, rather than having to start from scratch?

Solving itself can't really pick up partway through. To the extent it can, that's basically what passing in a lock file does - it provides hints that minimize the space explored.

However, the expensive and error-prone parts here are all cacheable operations, as detailed in #67. So, while solving can't pick up partway through, we should be able to get it to where it mostly doesn't matter.

Failing that, can it make some attempt to throttle and retry failed network operations?

Retrying is very much an option. So is throttling - there's already a case (sdboyer/gps#28) for throttling it down to one at a time, anyway. Another case to defend against is inactivity - sdboyer/gps#84.

The only reason I haven't pursued these is because I wasn't encountering such issues myself with any regularity, so didn't have a good testbed against which to check possible solutions. Well...that, and, I haven't yet taken the time to research strategies that might generically work well in these cases 😄

it basically corrupts and kills the entire ensure run

Just a note - this touches a larger class of problem. When sources straight up fail/don't exist, the solver needs to bail out more directly. I've not made progress on this yet because it ends up being part of the "make errors better" problem. The general ticket for that is sdboyer/gps#20, and indirectly #22.

@sdboyer
Copy link
Member

sdboyer commented Apr 22, 2017

We've addressed this upstream with default timeout handling now, I think.

@sdboyer sdboyer closed this as completed Apr 22, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants