Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(migration): improve job progress logging in MigrateStateTree #329

Merged
merged 7 commits into from
Dec 3, 2024
7 changes: 4 additions & 3 deletions builtin/v9/migration/top.go
Original file line number Diff line number Diff line change
Expand Up @@ -397,11 +397,12 @@ func MigrateStateTree(ctx context.Context, store cbor.IpldStore, newManifestCID
case <-time.After(cfg.ProgressLogPeriod):
jobsNow := jobCount // Snapshot values to avoid incorrect-looking arithmetic if they change.
doneNow := doneCount
pendingNow := jobsNow - doneNow
elapsed := time.Since(startTime)
rate := float64(doneNow) / elapsed.Seconds()
log.Log(rt.INFO, "%d jobs created, %d done, %d pending after %v (%.0f/s)",
jobsNow, doneNow, pendingNow, elapsed, rate)
percentComplete := float64(doneNow) / float64(jobsNow) * 100

log.Log(rt.INFO, "Performing migration: %d of %d jobs complete (%.1f%%, %.0f/s)",
doneNow, jobsNow, percentComplete, rate)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you watch a migration happen using lotus-shed you might find that percentComplete is not helpful and we need to remove it. I hoped we could do this and my original issue I think said we should, but watching it progress myself we are adding jobs at the same time as completing them, so we never know what "complete" is. I don't think we have an easy way to pre-calculate the number of jobs. So, unfortunately, the main benefit of this logging is simply letting people know that it's happening and that their node isn't stuck.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will something like this work

log.Log(rt.INFO, "Performing migration: %s of %s jobs processed (%s/s) [%v elapsed]", doneStr, jobsStr, rateStr, elapsed.Round(time.Second))

even though the done and total jobs will be changing, it will still show that the node isn't stuck and working just fine.

case <-workersFinished:
return
case <-ctx.Done():
Expand Down
Loading