Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of reserved_ports causes job-register evaluations to fail without a reason #1046

Closed
cosmopetrich opened this issue Apr 7, 2016 · 4 comments
Labels

Comments

@cosmopetrich
Copy link

Nomad version

Nomad v0.3.1

Operating system and Environment details

  • running in an AWS EC2 VPC in us-west-2
  • CoreOS 899.13.0 using the official AMI
  • Docker version 1.9.1
  • nomad running under systemd directly on the OS (i.e. not inside a container)

I initially hit the error when testing with 3 server nodes + 2 client nodes bootstrapped via consul, but I'm seeing the same symptoms even with a single node running in both client+server mode without consul.

Issue

Including any reserved_ports entry in the client's reserved block causes evaluations to fail with no indication as to why.

==> Monitoring evaluation "96f07a2c"
    Evaluation triggered by job "example"
    Evaluation status changed: "pending" -> "failed"
==> Evaluation "96f07a2c" finished with status "failed"
$ curl -s localhost:4646/v1/evaluations | python -m json.tool
[
    {
        "ID": "01e73507-8a4e-f961-3535-c6e9ca38e9de",
        "Priority": 50,
        "Type": "service",
        "TriggeredBy": "job-register",
        "JobID": "example",
        "JobModifyIndex": 6,
        "NodeID": "",
        "NodeModifyIndex": 0,
        "Status": "blocked",
        "StatusDescription": "",
        "Wait": 0,
        "NextEval": "",
        "PreviousEval": "96f07a2c-11a1-3916-f790-b37ab213794c",
        "ClassEligibility": {
            "v1:6305318303864028080": true
        },
        "EscapedComputedClass": false,
        "CreateIndex": 8,
        "ModifyIndex": 8
    },
    {
        "ID": "96f07a2c-11a1-3916-f790-b37ab213794c",
        "Priority": 50,
        "Type": "service",
        "TriggeredBy": "job-register",
        "JobID": "example",
        "JobModifyIndex": 6,
        "NodeID": "",
        "NodeModifyIndex": 0,
        "Status": "failed",
        "StatusDescription": "maximum attempts reached (5)",
        "Wait": 0,
        "NextEval": "",
        "PreviousEval": "",
        "ClassEligibility": null,
        "EscapedComputedClass": false,
        "CreateIndex": 7,
        "ModifyIndex": 9
    }
]
$ nomad status example

ID          = example
Name        = example
Type        = service
Priority    = 50
Datacenters = dc1
Status      = pending
Periodic    = false


==> Evaluations

ID        Priority  Triggered By  Status
01e73507  50        job-register  blocked
96f07a2c  50        job-register  failed


==> Allocations
ID  Eval ID  Node ID  Task Group  Desired  Status
$ curl -s localhost:4646/v1/allocations
[]

Attempting to run a task that requests too much CPU or memory will work as expected, with Nomad printing a message indicating why it couldn't place the job.

There's some further discussion of the issue on the mailing list, where Diptanu and Parveen helped me figure out what was going on.

Reproduction steps

Bring up the nomad agent with a config along the lines of the following, replacing PRIVATE_IP as appropriate.

data_dir = "/var/lib/nomad/data"
log_level = "DEBUG"

bind_addr = "0.0.0.0"
advertise {
    rpc = "PRIVATE_IP:4647"
}

server {
    enabled = true
    bootstrap_expect = 1
}

client {
    enabled = true

    servers = ["PRIVATE_IP"]

    reserved {
        cpu = 500
        memory = 512
        disk = 10000
        reserved_ports = "22"
    }
}

Run nomad with nomad agent -config /path/to/config.hcl, wait for the client to become ready in nomad node-status, then run:

nomad init
nomad run example.nomad

The evaluation should fail. Removing the reserved_ports line and restarting nomad, then re-running nomad run example.nomad will cause the evaluation to succeed and a redis:latest container to come up in the local docker instance.

Nomad logs

I've uploaded the debug logs for a 3 server + 1 client cluster to this gist. They cover the period from when nomad started till after the evaluation failed.

The logs from the single-server config above are in this gist.

@dadgar dadgar added the type/bug label Apr 7, 2016
@dadgar
Copy link
Contributor

dadgar commented Apr 7, 2016

Fixed by 36a7505. Thanks for the great reproduction steps!

@dadgar dadgar closed this as completed Apr 7, 2016
@cosmopetrich
Copy link
Author

Glad I could help. Thanks for the quick fix!

@dennybaa
Copy link

dennybaa commented Mar 1, 2017

@dadgar @cosmopetrich As I undestaned should be present in v0.4.1? However I see the same behavior as described.

# nomad status -evals -verbose test/periodic-1488392940
ID          = test/periodic-1488392940
Name        = test/periodic-1488392940
Type        = batch
Priority    = 50
Datacenters = dc1
Status      = pending
Periodic    = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
pio-train   1       0         0        0       0         0

Evaluations
ID                                    Priority  Triggered By       Status   Placement Failures
5d41f30f-b5b2-0f59-123f-1a5a538c4aa6  50        max-plan-attempts  blocked  N/A - In Progress
b3096e46-e0e7-453c-22d3-74dec64ec493  50        periodic-job       failed   false

Allocations
No allocations placed
{
    "ID": "b3096e46-e0e7-453c-22d3-74dec64ec493",
    "Priority": 50,
    "Type": "batch",
    "TriggeredBy": "periodic-job",
    "JobID": "test/periodic-1488392940",
    "JobModifyIndex": 439,
    "NodeID": "",
    "NodeModifyIndex": 0,
    "Status": "failed",
    "StatusDescription": "maximum attempts reached (2)",
    "Wait": 0,
    "NextEval": "",
    "PreviousEval": "",
    "BlockedEval": "5d41f30f-b5b2-0f59-123f-1a5a538c4aa6",
    "FailedTGAllocs": null,
    "ClassEligibility": null,
    "EscapedComputedClass": false,
    "AnnotatePlan": false,
    "SnapshotIndex": 440,
    "QueuedAllocations": {
        "pio-train": 1
    },
    "CreateIndex": 440,
    "ModifyIndex": 442
}

Still there's nowhere reason of failure. Removing reserved_ports does the workaround trick and unblocks me, thank you @cosmopetrich!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants