Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native service discovery fails with multiple raw_exec tasks #14210

Closed
Median555 opened this issue Aug 22, 2022 · 3 comments
Closed

Native service discovery fails with multiple raw_exec tasks #14210

Median555 opened this issue Aug 22, 2022 · 3 comments

Comments

@Median555
Copy link

Nomad version

Nomad v1.3.3 (428b2cd)

Operating system and Environment details

Windows Server 2019 Standard 1809
Also reproduced on Windows 10 Enterprise with dev agent

Issue

Nomad's native service discovery fails to register a service when the service stanza is in a main task and the job has an init task. Both tasks use the raw_exec driver. Nomad fails with the error message "task_services: rpc error: rpc error: service registration insert failed: object missing primary index". The task is not marked as failed.
Moving the service stanza to the task group works.

Seems similar to #13483 (and the linked issue #13493) but this is for two raw_exec tasks instead.

Reproduction steps

Run the first job file below. See error either in ui or server log.

Moving the service stanza to the outer task group works. Run the second job file below.

Expected Result

The service is discoverable:

$ nomad service list
Service Name  Tags
some-service  []

Actual Result

An error is logged and the service isn't discoverable:

$ nomad service list
No service registrations found

Job file (if appropriate)

With service stanza in task:

job "test" {
  datacenters = ["dc1"]

  group "service" {
    network {
      port "http" {}
    }

    task "download_artifact" {

      lifecycle {
        hook    = "prestart"
        sidecar = "false"
      }

      driver = "raw_exec"
      config {
        command = "powershell"
        args    = ["whoami"]
      }
    }

    task "api" {
      driver = "raw_exec"

      config {
        command = "powershell"
        args    = ["while($true) { Get-Date; sleep 5; }"]
      }
    
      service {
        name     = "some-api"
        provider = "nomad"
        port     = "http"
      }
    }
  }
}

The actual "work" is just dummy scripts.

With service stanza in task group:

job "test" {
  datacenters = ["dc1"]

  group "service" {
    network {
      port "http" {}
    }

    task "init_task" {

      lifecycle {
        hook    = "prestart"
        sidecar = "false"
      }

      driver = "raw_exec"
      config {
        command = "powershell"
        args    = ["whoami"]
      }
    }

    task "main_task" {
      driver = "raw_exec"

      config {
        command = "powershell"
        args    = ["while($true) { Get-Date; sleep 5; }"]
      }  
    }

    service {
      name     = "some-service"
      provider = "nomad"
      port     = "http"
    }
  }
}

Nomad Server logs (if appropriate)

Reproducing the error on the dev-agent prints the following to the log:

2022-08-22T09:10:55.060+0200 [ERROR] nomad.fsm: UpsertServiceRegistrations failed: error="service registration insert failed: object missing primary index"
2022-08-22T09:10:55.061+0200 [ERROR] client.rpc: error performing RPC to server: error="rpc error: service registration insert failed: object missing primary index" rpc=ServiceRegistration.Upsert server=127.0.0.1:4647
2022-08-22T09:10:55.061+0200 [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: service registration insert failed: object missing primary index" rpc=ServiceRegistration.Upsert server=127.0.0.1:4647
2022-08-22T09:10:55.061+0200 [ERROR] client.alloc_runner.task_runner: poststart failed: alloc_id=b546f68d-ed48-29c6-7e6d-47218e6f1d11 task=main_task
  error=
  | 1 error occurred:
  |     * poststart hook "task_services" failed: rpc error: service registration insert failed: object missing primary index
  |
@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Aug 22, 2022
@tgross
Copy link
Member

tgross commented Aug 22, 2022

Hi @Median555 this was fixed by ecad69c which was supposed to have shipped in Nomad 1.3.2 but missed the boat. It'll be shipping in Nomad 1.3.4.

@tgross tgross closed this as completed Aug 22, 2022
Nomad - Community Issues Triage automation moved this from Needs Triage to Done Aug 22, 2022
@tgross tgross added this to the 1.3.4 milestone Aug 22, 2022
@Median555
Copy link
Author

Thanks for the reply @tgross! Sounds great. Sorry for the noise, I should have looked a little closer at the releases.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
Development

No branches or pull requests

2 participants