Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nil pointer dereference in deployment monitor #15235

Closed
siennathesane opened this issue Nov 14, 2022 · 2 comments · Fixed by #16011
Closed

nil pointer dereference in deployment monitor #15235

siennathesane opened this issue Nov 14, 2022 · 2 comments · Fixed by #16011
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/cli type/bug

Comments

@siennathesane
Copy link

Nomad version

Nomad v1.4.2

Operating system and Environment details

image

Issue

There is a nil pointer dereference when running nomad job run.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x8 pc=0x1042cf598]

goroutine 1 [running]:
github.com/hashicorp/nomad/command.(*DeploymentStatusCommand).ttyMonitor(0x14000a9f928, 0x1400083ec40, {0x140009fdad0, 0x24}, 0x0, 0x0)
	github.com/hashicorp/nomad/command/deployment_status.go:329 +0x1e48
github.com/hashicorp/nomad/command.(*DeploymentStatusCommand).monitor(0x1400083ee00?, 0x14000c26040?, {0x140009fdad0, 0x24}, 0x2?, 0x0?)
	github.com/hashicorp/nomad/command/deployment_status.go:185 +0x84
github.com/hashicorp/nomad/command.(*monitor).monitor(0x14000bc2690, {0x140009fda10, 0x24})
	github.com/hashicorp/nomad/command/monitor.go:302 +0xbc8
github.com/hashicorp/nomad/command.(*JobRunCommand).Run(0x1400052aa00, {0x1400004e070, 0x1, 0x1})
	github.com/hashicorp/nomad/command/job_run.go:376 +0x1074
github.com/mitchellh/cli.(*CLI).Run(0x140000e3540)
	github.com/mitchellh/cli@v1.1.4/cli.go:262 +0x4a8
main.RunCustom({0x1400004e050?, 0x3, 0x3})
	github.com/hashicorp/nomad/main.go:117 +0x350
main.Run(...)
	github.com/hashicorp/nomad/main.go:87
main.main()
	github.com/hashicorp/nomad/main.go:83 +0x50

Reproduction steps

Here is my waypoint.nomad job, it's using a GCP CSI volume on the backend.

job "waypoint" {
  region      = "americas"
  datacenters = ["us-central1"]
  type        = "service"

  group "server" {
    count = 1

    update {
      max_parallel     = 1
      canary           = 1
      min_healthy_time = "10s"
      healthy_deadline = "3m"
      auto_revert      = true
      auto_promote     = true
    }

    network {
      mode = "bridge"
      port "http" {}
      port "grpc" {}
    }

    service {
      name     = "waypoint"
      provider = "consul"
      tags = [
        "traefik.http.services.waypoint.loadBalancer.server.port=${NOMAD_PORT_http}",
        "traefik.http.routers.waypoint.rule=Host(\"waypoint.domain.io\")",
        "traefik.http.routers.waypoint.entrypoints=websecure",
        "traefik.http.routers.waypoint.service=waypoint@consulcatalog",
        "traefik.http.routers.waypoint.tls=true"
      ]
    }

    service {
      name     = "waypoint-api"
      provider = "consul"

      tags = [
        "traefik.http.services.waypoint-api.loadBalancer.server.port=${NOMAD_PORT_grpc}",
        "traefik.http.routers.waypoint-api.rule=Host(\"waypoint.domain.io\")",
        "traefik.http.routers.waypoint-api.entrypoints=websecure",
        "traefik.http.routers.waypoint-api.service=waypoint-api@consulcatalog",
        "traefik.http.routers.waypoint-api.tls=true"
      ]
    }

    volume "waypoint" {
      type            = "csi"
      source          = "waypoint"
      access_mode     = "single-node-writer"
      attachment_mode = "file-system"
    }

    task "disk-check" {
      driver = "docker"
      config {
        image   = "busybox:latest"
        command = "sh"
        args = [
          "-c",
          "chown -R 100:1000 /data/"
        ]
      }
      resources {
        cpu        = 100
        memory     = 100
        memory_max = 150
      }
      restart {
        attempts = 2
        interval = "3s"
        mode     = "fail"
      }
      lifecycle {
        hook    = "prestart"
        sidecar = false
      }
      volume_mount {
        volume      = "waypoint"
        destination = "/data"
      }
    }

    task "server" {
      driver = "docker"

      config {
        image = "hashicorp/waypoint:0.10.3"

        args = [
          "server",
          "run",
          "-accept-tos",
          "-db=/data/data.db",
          "-tls-cert-file=/home/waypoint/tls.crt",
          "-tls-key-file=/home/waypoint/tls.key",
          "-advertise-addr=https://waypoint.domain.io",
          "-listen-grpc=0.0.0.0:${NOMAD_PORT_grpc}",
          "-listen-http-insecure=0.0.0.0:${NOMAD_PORT_http}"
        ]

        ports = [
          "http",
          "grpc"
        ]

        volumes = [
          "local/tls.crt:/home/waypoint/tls.crt",
          "local/tls.key:/home/waypoint/tls.key"
        ]
      }

      template {
        data        = <<EOF
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
EOF
        destination = "local/tls.crt"
      }

      template {
        data        = <<EOF
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
EOF
        destination = "local/tls.key"
      }

      resources {
        cpu        = 100
        memory_max = 512
      }

      restart {
        attempts = 2
        mode     = "fail"
      }

      volume_mount {
        volume      = "waypoint"
        destination = "/data"
      }
    }
  }
}

And the waypoint.volume.nomad definition:

# volume registration
type      = "csi"
id        = "waypoint"
name      = "waypoint"
plugin_id = "gcepd"

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

topology_request {
  required {
    topology {
      segments { "topology.gke.io/zone" = "us-central1-a" }
    }
  }
}

Expected Result

The CLI wouldn't panic.

Actual Result

Nomad is throwing the same errors #13450, specifically volume max claim reached.

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

@siennathesane
Copy link
Author

Waypoint was failing to deploy due to a CSI issue, but the Nomad CLI was throwing this error.

@jrasell
Copy link
Member

jrasell commented Nov 14, 2022

Hi @mxplusb and thanks for raising this issue. We will take a look into reproducing this and raising a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/cli type/bug
Projects
Development

Successfully merging a pull request may close this issue.

2 participants