-
Notifications
You must be signed in to change notification settings - Fork 1k
Parallelize (maybe?) gps.WriteDepTree #895
Comments
Hey, is someone working on this issue, does it need to be claimed? |
@tonto nobody's working on it, so it's all yours if you want it! 🎉 🎉 |
@sdboyer cool, I'll have a go then, fingers crossed :) |
great! please don't hesitate to ask questions, whether here or in slack. |
@sdboyer Without making a PR I pasted the code for func WriteDepTree(basedir string, l Lock, sm SourceManager, sv bool) error {
if l == nil {
return fmt.Errorf("must provide non-nil Lock to WriteDepTree")
}
err := os.MkdirAll(basedir, 0777)
if err != nil {
return err
}
wg := sync.WaitGroup{}
// Related to Q1
var e error
for _, p := range l.Projects() {
go func(p LockedProject) {
wg.Add(1)
defer wg.Done()
to := filepath.FromSlash(filepath.Join(basedir, string(p.Ident().ProjectRoot)))
err = sm.ExportProject(p.Ident(), p.Version(), to)
if err != nil {
// Q1 - Since any goroutine can error out, what do we do with err's
// (e here is temporary just to catch any error)
// Since the function previously returned immediately after an error and called `removeAll`
// does this mean that we should stop all other goroutines
// or just record an error like this (or use `pkg/errors` to wrap them) and wait out
// the rest of the goroutines
e = fmt.Errorf("error while exporting %s: %s", p.Ident().ProjectRoot, err)
log.Println("project error: ", to, e)
return
}
if sv {
filepath.Walk(to, stripVendor)
}
// TODO(sdboyer) dump version metadata file
}(p)
}
wg.Wait()
// Related to Q1
if e != nil {
removeAll(basedir)
return err
}
return nil
} I know that this was just supposed to be a proof of concept bench mark and that maybe the question above is misplaced, so you don't have to answer it right now, but the real problem is that I can't really run the benchmark for
Probably complaining because of conflicting work between goroutines... Sorry if I got things wrong, your feedback would be helpful. |
@tonto no worries! like i said, questions are totally fine 😄 that you'd be getting an error like that is surprising - it would seem to suggest that there's a possible scenario in which more generally, though...yep! this is the kind of challenge you have here - a i might instead explore something more like a dispatcher/controller goroutine plus a configurable number of worker goroutines. does that make sense? |
@sdboyer Yes, was thinking along the same lines concerning controller - worker groutines, so that's clear, thanks :) Regarding the error I'm getting not really but I will investigate further
|
great! also, a key note about the domain model here... there's an assumption that projects cannot be logical descendants of one another - only siblings, or entirely unrelated. that is, we assume that i'm raising it now just to say that, for the purposes of this issue, you should ignore it. we'll work on it in a follow-up 😄 |
done now |
gps.WriteDepTree()
is the global entry point that actually creates avendor/
tree. Right now, it operates entirely in serial, writing out dependency directories one at a time. I don't know exactly how much we'd be able to get out of parallelizing this, as most of the bottleneck here is likely to be I/O. However, it seems worth at least trying to run these operations concurrently; my naive expectation would be that 2x parallelization would at least be able to earn some speedups; the kernel ought to be able to schedule the vcs subprocesses sanely so that read/CPU work segments in one process alternate with writing in the other process, helping to ensure we're always saturating disk writes.If someone could do an experimental implementation of this, then work up some benchmarks to validate the hypothesis, that'd be great.
The text was updated successfully, but these errors were encountered: