Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add warning when running out of disk. #873

Merged
merged 1 commit into from
Jan 6, 2021

Conversation

TingluoHuang
Copy link
Member

Base on telemetry, lots of workflow run failed as the runner machine run out of disk.

When the machine runs out of disk, the runner may crash due to can't write to the log files, the OS may also freeze up.

The workflow run will look like hanging there do nothing and eventually gets abandoned by the service.

The PR tries to add a warning annotation to the job when there is less than 100MB left on the disk, to provide a hint to customers about why their job failed.

@TingluoHuang TingluoHuang requested a review from a team as a code owner December 19, 2020 02:58
Copy link
Contributor

@hross hross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many customers have this issue, do we think? I'm wondering if there are runs that will be impacted and get this notation all the time and it will annoy those customers (maybe not).

Do we already have some telemetry in place that can tell us how frequently this happens so when we release this we can stop it from rolling further if it has unintended noise consequences?

@@ -325,6 +328,11 @@ public async Task<List<IStep>> InitializeJob(IExecutionContext jobContext, Pipel
}
}

if (jobContext.Global.Variables.GetBoolean("__DISK.CHECK") ?? true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this convention? Do we have other variables like this?

I think we should save this off as a constant somewhere and explain where it comes from in the code.

var workDirRoot = Directory.GetDirectoryRoot(HostContext.GetDirectory(WellKnownDirectory.Work));
var driveInfo = new DriveInfo(workDirRoot);
var freeSpaceInMB = driveInfo.AvailableFreeSpace / 1024 / 1024;
if (freeSpaceInMB < 100) // Add warning when disk is lower than 100 MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this configurable? Maybe instead of a boolean free disk check we have it be an int and then if the int exists and is non-zero we check for that number of MB/KB?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we can add telemetry so we can discern what portion of abandons are due to this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue.Data[Constants.Runner.InternalTelemetryIssueDataKey] = Constants.Runner.LowDiskSpace; will make this issue shows in Kusto

@hross hross self-requested a review January 4, 2021 10:49
Copy link
Contributor

@hross hross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved this but probably should merge/update some based on my suggestions.

@TingluoHuang TingluoHuang force-pushed the users/tihuang/lowdiskwarning branch 2 times, most recently from 86f6f87 to a550df4 Compare January 5, 2021 20:24
@TingluoHuang TingluoHuang merged commit e808190 into main Jan 6, 2021
@TingluoHuang TingluoHuang deleted the users/tihuang/lowdiskwarning branch January 6, 2021 02:49
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants