Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic for nil-pointer during exec DestroyTask #6760

Closed
tgross opened this issue Nov 22, 2019 · 2 comments · Fixed by #6763
Closed

panic for nil-pointer during exec DestroyTask #6760

tgross opened this issue Nov 22, 2019 · 2 comments · Fixed by #6763

Comments

@tgross
Copy link
Member

tgross commented Nov 22, 2019

During pre-RC testing of 0.10.2 I've found a bug in the exec driver's DestroyTask. It looks like a fairly tight race between handle.IsRunning() and handle.exec.Shutdown() at https://github.com/hashicorp/nomad/blob/v0.10.1/drivers/exec/driver.go#L462

$ /opt/gopath/bin/nomad --version
Nomad v0.10.2-dev (d2177b2f0d20051d371e9dba51475f313b815c93+CHANGES)
    2019-11-22T13:59:33.988Z [DEBUG] http: request complete: method=DELETE path=/v1/job/countdash?purge=false duration=4.303763ms
    2019-11-22T13:59:33.988Z [DEBUG] worker: dequeued evaluation: eval_id=d84193ff-ff3c-6c74-9c93-dd215c3170be
    2019-11-22T13:59:33.988Z [DEBUG] worker.service_sched: reconciled current state with desired state: eval_id=d84193ff-ff3c-6c74-9c93-dd215c3170be job_id=countdash namespace=default results="Total changes: (place 0) (destructive 0) (inplace 0) (stop 2)
Desired Changes for "dashboard": (place 0) (inplace 0) (destructive 0) (stop 1) (migrate 0) (ignore 0) (canary 0)
Desired Changes for "api": (place 0) (inplace 0) (destructive 0) (stop 1) (migrate 0) (ignore 0) (canary 0)"
    2019-11-22T13:59:33.989Z [DEBUG] http: request complete: method=GET path=/v1/evaluation/d84193ff-ff3c-6c74-9c93-dd215c3170be duration=103.832µs
    2019-11-22T13:59:33.990Z [DEBUG] http: request complete: method=GET path=/v1/evaluation/d84193ff-ff3c-6c74-9c93-dd215c3170be/allocations duration=113.013µs
    2019-11-22T13:59:33.992Z [DEBUG] worker: submitted plan for evaluation: eval_id=d84193ff-ff3c-6c74-9c93-dd215c3170be
    2019-11-22T13:59:33.992Z [DEBUG] worker.service_sched: setting eval status: eval_id=d84193ff-ff3c-6c74-9c93-dd215c3170be job_id=countdash namespace=default status=complete
    2019-11-22T13:59:33.993Z [DEBUG] worker: updated evaluation: eval="<Eval "d84193ff-ff3c-6c74-9c93-dd215c3170be" JobID: "countdash" Namespace: "default">"
    2019-11-22T13:59:33.993Z [DEBUG] worker: ack evaluation: eval_id=d84193ff-ff3c-6c74-9c93-dd215c3170be
    2019-11-22T13:59:33.993Z [DEBUG] client: updated allocations: index=32 total=2 pulled=2 filtered=0
    2019-11-22T13:59:33.993Z [DEBUG] client: allocation updates: added=0 removed=0 updated=2 ignored=0
    2019-11-22T13:59:33.996Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=2 ignored=0 errors=0
    2019-11-22T13:59:34.136Z [DEBUG] client: updated allocations: index=34 total=2 pulled=0 filtered=2
    2019-11-22T13:59:34.136Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=2
    2019-11-22T13:59:34.136Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=2 errors=0
    2019-11-22T13:59:34.158Z [ERROR] client.alloc_runner.task_runner.task_hook.stats_hook: failed to start stats collection for task with unrecoverable error: alloc_id=0316f581-f098-22ed-cf5b-0b93a025fd0d task=connect-proxy-count-dashboard error="container stopped"
    2019-11-22T13:59:34.160Z [ERROR] client.alloc_runner.task_runner.task_hook.stats_hook: failed to start stats collection for task with unrecoverable error: alloc_id=96c3b0da-1355-2c46-a37a-b69b8d45fdf1 task=connect-proxy-count-api error="container stopped"
    2019-11-22T13:59:34.168Z [WARN]  client.driver_mgr.docker: RemoveImage on non-referenced counted image id: driver=docker image_id=sha256:72e91d8680d853b874d9aedda3a4b61048630d2043dd490ff36f5b0929f69874
    2019-11-22T13:59:34.170Z [WARN]  client.driver_mgr.docker: RemoveImage on non-referenced counted image id: driver=docker image_id=sha256:72e91d8680d853b874d9aedda3a4b61048630d2043dd490ff36f5b0929f69874
    2019-11-22T13:59:34.215Z [ERROR] client.driver_mgr.exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=96c3b0da-1355-2c46-a37a-b69b8d45fdf1 driver=exec task_name=web error="rpc error: code = Unavailable desc = transport is closing"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x135125f]
goroutine 489 [running]:
github.com/hashicorp/nomad/drivers/exec.(*Driver).DestroyTask(0xc0007a23c0, 0xc0004d3cc0, 0x31, 0x1, 0x1, 0x0)
	/opt/gopath/src/github.com/hashicorp/nomad/drivers/exec/driver.go:462 +0x17f
github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*TaskRunner).clearDriverHandle(0xc000428a00)
	/opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/taskrunner/task_runner_getters.go:85 +0xd8
github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*TaskRunner).Run(0xc000428a00)
	/opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/taskrunner/task_runner.go:533 +0x1035
created by github.com/hashicorp/nomad/client/allocrunner.(*allocRunner).runTasks
	/opt/gopath/src/github.com/hashicorp/nomad/client/allocrunner/alloc_runner.go:312 +0x98

Jobspec below, but I've run this particular job plenty of times during testing and never hit this previously:

job "countdash" {
  datacenters = ["dc1"]

  group "api" {
    network {
      mode = "bridge"
    }

    service {
      name = "count-api"
      port = "9001"

      connect {
        sidecar_service {}
      }
    }

    task "web" {
      driver = "exec"

      env {
        PORT = "9001"
      }

      config {
        command = "/usr/local/bin/counting-service"
      }
    }
  }

  group "dashboard" {
    network {
      mode = "bridge"

      port "http" {
        static = 9002
        to     = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8080
            }
          }
        }
      }
    }

    task "dashboard" {
      driver = "exec"

      env {
        COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
        PORT                 = "9002"
      }

      config {
        command = "/usr/local/bin/dashboard-service"
      }
    }
  }
}
@tgross
Copy link
Member Author

tgross commented Nov 22, 2019

at https://github.com/hashicorp/nomad/blob/v0.10.1/drivers/exec/driver.go#L462

Turns out this code path was changed after 0.10.1 and I was looking at the old tag and not the current branch.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants