Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add driver.docker counter metric for OOM Killer events #4185

Merged
27 changes: 26 additions & 1 deletion client/driver/docker.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,13 @@ import (
"time"

"github.com/armon/circbuf"
docker "github.com/fsouza/go-dockerclient"
"github.com/fsouza/go-dockerclient"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goimports suggested this change. Please let me know if you want me to rollback this line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its because the imported package in that folder is already called docker :)


"github.com/docker/docker/cli/config/configfile"
"github.com/docker/docker/reference"
"github.com/docker/docker/registry"

"github.com/armon/go-metrics"
"github.com/hashicorp/go-multierror"
"github.com/hashicorp/go-plugin"
"github.com/hashicorp/nomad/client/allocdir"
Expand Down Expand Up @@ -478,6 +479,9 @@ type DockerHandle struct {
client *docker.Client
waitClient *docker.Client
logger *log.Logger
jobName string
taskGroupName string
taskName string
Image string
ImageID string
containerID string
Expand Down Expand Up @@ -898,6 +902,9 @@ func (d *DockerDriver) Start(ctx *ExecContext, task *structs.Task) (*StartRespon
executor: exec,
pluginClient: pluginClient,
logger: d.logger,
jobName: d.DriverContext.jobName,
taskGroupName: d.DriverContext.taskGroupName,
taskName: d.DriverContext.taskName,
Image: d.driverConfig.ImageName,
ImageID: d.imageID,
containerID: container.ID,
Expand Down Expand Up @@ -1771,6 +1778,9 @@ func (d *DockerDriver) Open(ctx *ExecContext, handleID string) (DriverHandle, er
executor: exec,
pluginClient: pluginClient,
logger: d.logger,
jobName: d.DriverContext.jobName,
taskGroupName: d.DriverContext.taskGroupName,
taskName: d.DriverContext.taskName,
Image: pid.Image,
ImageID: pid.ImageID,
containerID: pid.ContainerID,
Expand Down Expand Up @@ -1924,6 +1934,21 @@ func (h *DockerHandle) run() {
h.logger.Printf("[ERR] driver.docker: failed to inspect container %s: %v", h.containerID, ierr)
} else if container.State.OOMKilled {
werr = fmt.Errorf("OOM Killed")
labels := []metrics.Label{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me the interesting labels would be job, group and task :)

Copy link
Contributor Author

@jesusvazquez jesusvazquez Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for me too, but i don't know how to access them from here. I'm happy to implement a solution but I'd need some help.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesusvazquez you could add those fields to the DockerHandle struct, and then populate them from the DriverContext (I'm preparing a PR so you can have those available).

h := &DockerHandle{
client: client,
waitClient: waitClient,
executor: exec,
pluginClient: pluginClient,
logger: d.logger,
Image: d.driverConfig.ImageName,
ImageID: d.imageID,
containerID: container.ID,
version: d.config.Version.VersionNumber(),
killTimeout: GetKillTimeout(task.KillTimeout, maxKill),
maxKillTimeout: maxKill,
doneCh: make(chan bool),
waitCh: make(chan *dstructs.WaitResult, 1),
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: #4196

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh really nice @jvrplmlmn Ill update this PR as soon as I get back from my vacations.

{
Name: "job",
Value: h.jobName,
},
{
Name: "task_group",
Value: h.taskGroupName,
},
{
Name: "task",
Value: h.taskName,
},
}
metrics.IncrCounterWithLabels([]string{"driver", "docker", "oom"}, 1, labels)
}

close(h.doneCh)
Expand Down