Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tasks are killed periodically with Killing task: vault: failed to derive token: failed to unwrap the token for task #4097

Closed
dansteen opened this issue Apr 3, 2018 · 4 comments · Fixed by #4100

Comments

@dansteen
Copy link

dansteen commented Apr 3, 2018

Nomad version

Nomad v0.7.1 (0b295d3)

Operating system and Environment details

debian 8

Issue

We have found that our jobs will be periodically killed by nomad with the following message:

04/03/18 08:37:59 EDT Killing Killing task: vault: failed to derive token: failed to unwrap the token for task "core-api-java"

Job file (if appropriate)

{
    "Job": {
        "AllAtOnce": false,
        "Constraints": [
            {
                "LTarget": "${meta.role}",
                "Operand": "=",
                "RTarget": "core"
            },
            {
                "LTarget": "${meta.env}",
                "Operand": "=",
                "RTarget": "stag"
            }
        ],
        "CreateIndex": 24476,
        "Datacenters": [
            "awse"
        ],
        "ID": "core-api-java-stag",
        "JobModifyIndex": 825299,
        "Meta": null,
        "ModifyIndex": 825334,
        "Name": "core-api-java-stag",
        "Namespace": "default",
        "ParameterizedJob": null,
        "ParentID": "",
        "Payload": null,
        "Periodic": null,
        "Priority": 50,
        "Region": "global",
        "Stable": true,
        "Status": "running",
        "StatusDescription": "",
        "Stop": false,
        "SubmitTime": 1522770727391021305,
        "TaskGroups": [
            {
                "Constraints": [
                    {
                        "LTarget": "${attr.vault.version}",
                        "Operand": "version",
                        "RTarget": ">= 0.6.1"
                    }
                ],
                "Count": 2,
                "EphemeralDisk": {
                    "Migrate": false,
                    "SizeMB": 300,
                    "Sticky": false
                },
                "Meta": null,
                "Name": "app",
                "RestartPolicy": {
                    "Attempts": 2,
                    "Delay": 15000000000,
                    "Interval": 60000000000,
                    "Mode": "delay"
                },
                "Tasks": [
                    {
                        "Artifacts": [
                            {
                                "GetterMode": "any",
                                "GetterOptions": null,
                                "GetterSource": "https://jenkins-archive.traitify.com/core-api-java/core-api-java-config-3a42de03767d24917bc137b6ab011c26075256c3.yml.tmpl",
                                "RelativeDest": "local/"
                            },
                            {
                                "GetterMode": "any",
                                "GetterOptions": null,
                                "GetterSource": "https://jenkins-archive.traitify.com/core-api-java/core-api-java-3a42de03767d24917bc137b6ab011c26075256c3.jar",
                                "RelativeDest": "local/"
                            }
                        ],
                        "Config": {
                            "args": [
                                "server",
                                "local/core-api-java-config.yml"
                            ],
                            "jar_path": "local/core-api-java-3a42de03767d24917bc137b6ab011c26075256c3.jar"
                        },
                        "Constraints": null,
                        "DispatchPayload": null,
                        "Driver": "java",
                        "Env": {
                            "APP_NAME": "core-api-java",
                            "CHEF_ENV": "${meta.env}",
                            "GIT_HASH": "3a42de03767d24917bc137b6ab011c26075256c3",
                            "JAVA_TOOL_OPTIONS": "        -Dcom.sun.management.jmxremote\n        -Dcom.sun.management.jmxremote.port=${NOMAD_PORT_jmx}\n        -Dcom.sun.management.jmxremote.local.only=true\n        -Dcom.sun.management.jmxremote.authenticate=false\n        -Dcom.sun.management.jmxremote.ssl=false\n        -Djava.rmi.server.hostname=localhost\n        -Dnetworkaddress.cache.ttl=60\n        -Xms5120M\n        -XX:-UseConcMarkSweepGC\n        -Xmx5120M\n"
                        },
                        "KillSignal": "",
                        "KillTimeout": 5000000000,
                        "Leader": true,
                        "LogConfig": {
                            "MaxFileSizeMB": 10,
                            "MaxFiles": 10
                        },
                        "Meta": null,
                        "Name": "core-api-java",
                        "Resources": {
                            "CPU": 1900,
                            "DiskMB": 0,
                            "IOPS": 0,
                            "MemoryMB": 7120,
                            "Networks": [
                                {
                                    "CIDR": "",
                                    "Device": "",
                                    "DynamicPorts": [
                                        {
                                            "Label": "app",
                                            "Value": 0
                                        },
                                        {
                                            "Label": "admin",
                                            "Value": 0
                                        },
                                        {
                                            "Label": "jmx",
                                            "Value": 0
                                        }
                                    ],
                                    "IP": "",
                                    "MBits": 10,
                                    "ReservedPorts": null
                                }
                            ]
                        },
                        "Services": [
                            {
                                "AddressMode": "auto",
                                "CheckRestart": null,
                                "Checks": [
                                    {
                                        "AddressMode": "",
                                        "Args": null,
                                        "CheckRestart": null,
                                        "Command": "",
                                        "Header": null,
                                        "Id": "",
                                        "InitialStatus": "critical",
                                        "Interval": 10000000000,
                                        "Method": "",
                                        "Name": "app",
                                        "Path": "monitor/ping",
                                        "PortLabel": "app",
                                        "Protocol": "http",
                                        "TLSSkipVerify": false,
                                        "Timeout": 2000000000,
                                        "Type": "http"
                                    }
                                ],
                                "Id": "",
                                "Name": "core-api-java",
                                "PortLabel": "app",
                                "Tags": [
                                    "${node.unique.name}",
                                    "host__${node.unique.name}",
                                    "3a42de03767d24917bc137b6ab011c26075256c3",
                                    "version__3a42de03767d24917bc137b6ab011c26075256c3",
                                    "${meta.env}",
                                    "env__${meta.env}",
                                    "${meta.env}-core-api-java-prefix-/",
                                    "consuldogConfig:core-api-java-http_check.yaml.tmpl:http_check"
                                ]
                            },
                            {
                                "AddressMode": "auto",
                                "CheckRestart": null,
                                "Checks": [
                                    {
                                        "AddressMode": "",
                                        "Args": null,
                                        "CheckRestart": null,
                                        "Command": "",
                                        "Header": null,
                                        "Id": "",
                                        "InitialStatus": "critical",
                                        "Interval": 10000000000,
                                        "Method": "",
                                        "Name": "jmx",
                                        "Path": "",
                                        "PortLabel": "app",
                                        "Protocol": "",
                                        "TLSSkipVerify": false,
                                        "Timeout": 2000000000,
                                        "Type": "tcp"
                                    }
                                ],
                                "Id": "",
                                "Name": "core-api-java-jmx",
                                "PortLabel": "jmx",
                                "Tags": [
                                    "${node.unique.name}",
                                    "host__${node.unique.name}",
                                    "3a42de03767d24917bc137b6ab011c26075256c3",
                                    "version__3a42de03767d24917bc137b6ab011c26075256c3",
                                    "${meta.env}",
                                    "env__${meta.env}",
                                    "consuldogConfig:core-api-java-jmx.yaml.tmpl:jmx"
                                ]
                            }
                        ],
                        "ShutdownDelay": 0,
                        "Templates": [
                            {
                                "ChangeMode": "noop",
                                "ChangeSignal": "",
                                "DestPath": "local/core-api-java-config.yml",
                                "EmbeddedTmpl": "",
                                "Envvars": false,
                                "LeftDelim": "{{",
                                "Perms": "664",
                                "RightDelim": "}}",
                                "SourcePath": "local/core-api-java-config-3a42de03767d24917bc137b6ab011c26075256c3.yml.tmpl",
                                "Splay": 5000000000,
                                "VaultGrace": 15000000000
                            }
                        ],
                        "User": "",
                        "Vault": {
                            "ChangeMode": "noop",
                            "ChangeSignal": "SIGHUP",
                            "Env": true,
                            "Policies": [
                                "stag_sp"
                            ]
                        }
                    },
                    {
                        "Artifacts": [
                            {
                                "GetterMode": "any",
                                "GetterOptions": null,
                                "GetterSource": "https://jenkins-archive.traitify.com/core-api-java/core-api-java-remote-syslog2-3a42de03767d24917bc137b6ab011c26075256c3.yml.tmpl",
                                "RelativeDest": "local/"
                            }
                        ],
                        "Config": {
                            "args": [
                                "-c",
                                "/local/remote-syslog2.yml",
                                "-D"
                            ],
                            "command": "/usr/local/bin/remote_syslog"
                        },
                        "Constraints": null,
                        "DispatchPayload": null,
                        "Driver": "exec",
                        "Env": {
                            "GIT_HASH": "3a42de03767d24917bc137b6ab011c26075256c3",
                            "LOCAL_HOSTNAME": "${node.unique.name}",
                            "LOG_TASK_NAME": "core-api-java",
                            "APP_NAME": "core-api-java",
                            "CHEF_ENV": "${meta.env}"
                        },
                        "KillSignal": "",
                        "KillTimeout": 5000000000,
                        "Leader": false,
                        "LogConfig": {
                            "MaxFileSizeMB": 10,
                            "MaxFiles": 10
                        },
                        "Meta": null,
                        "Name": "log-shipper",
                        "Resources": {
                            "CPU": 100,
                            "DiskMB": 0,
                            "IOPS": 0,
                            "MemoryMB": 100,
                            "Networks": null
                        },
                        "Services": null,
                        "ShutdownDelay": 0,
                        "Templates": [
                            {
                                "ChangeMode": "noop",
                                "ChangeSignal": "",
                                "DestPath": "local/remote-syslog2.yml",
                                "EmbeddedTmpl": "",
                                "Envvars": false,
                                "LeftDelim": "{{",
                                "Perms": "664",
                                "RightDelim": "}}",
                                "SourcePath": "local/core-api-java-remote-syslog2-3a42de03767d24917bc137b6ab011c26075256c3.yml.tmpl",
                                "Splay": 5000000000,
                                "VaultGrace": 15000000000
                            }
                        ],
                        "User": "",
                        "Vault": null
                    }
                ],
                "Update": {
                    "AutoRevert": true,
                    "Canary": 0,
                    "HealthCheck": "checks",
                    "HealthyDeadline": 240000000000,
                    "MaxParallel": 1,
                    "MinHealthyTime": 30000000000,
                    "Stagger": 30000000000
                }
            }
        ],
        "Type": "service",
        "Update": {
            "AutoRevert": false,
            "Canary": 0,
            "HealthCheck": "",
            "HealthyDeadline": 0,
            "MaxParallel": 1,
            "MinHealthyTime": 0,
            "Stagger": 30000000000
        },
        "VaultToken": "",
        "Version": 13
    }
}
@dadgar
Copy link
Contributor

dadgar commented Apr 3, 2018

Hey @dansteen What version of Vault are you running and what is the Vault configuration (single server, performance replicated, etc). Also can you share the Vault logs around that incident? I would also look through your Vault audit log and share the req/responses that include this allocation. You can see how the metadata is formatted here (#2475 (comment))

@dadgar
Copy link
Contributor

dadgar commented Apr 3, 2018

@dansteen Think I got a fix that will go in with 0.8

@dansteen
Copy link
Author

dansteen commented Apr 4, 2018

wow @dadgar even without full information. Nice!

In case it helps:

Vault v0.9.0 ('bdac1854478538052ba5b7ec9a9ec688d35a3335')

We are using Vault in HA mode with an AWS ELB handling routing of traffic to the currently active vault server.

Here are the vault logs around this time:

<30>1 2018-04-03T08:38:59.572376-04:00 consul-01c.prod.awse vault 11628 - -  2018/04/03 08:38:59.572273 [INFO ] expiration: revoked lease: lease_id=auth/token/create/nomad-cluster-stag/6a02b3311ed7cf04e036dd0a8c5b159bb89cf116
<30>1 2018-04-03T08:39:03.172982-04:00 consul-01c.prod.awse vault 11628 - -  2018/04/03 08:39:03.172872 [INFO ] expiration: revoked lease: lease_id=auth/token/create/nomad-cluster-stag/05f145bd80bb0ecb96e34dbefaf997f884eb6e97
<30>1 2018-04-03T08:39:03.190550-04:00 consul-01c.prod.awse vault 11628 - -  2018/04/03 08:39:03.190496 [INFO ] expiration: revoked lease: lease_id=auth/token/create/nomad-cluster-stag/db411971047bb10ccb0d3be5feb618cea9874046
<30>1 2018-04-03T08:39:03.288071-04:00 consul-01c.prod.awse vault 11628 - -  2018/04/03 08:39:03.287950 [INFO ] expiration: revoked lease: lease_id=auth/token/create/nomad-cluster-stag/9991e4318834462b873a8e1048f90eca697b4f84
<30>1 2018-04-03T08:39:03.465263-04:00 consul-01c.prod.awse vault 11628 - -  2018/04/03 08:39:03.465147 [INFO ] expiration: revoked lease: lease_id=auth/token/create/nomad-cluster-stag/66122ec61e0ce23f9c06c49de7079b6e80281d57

Unfortunately, I didn't have audit logs enabled yet (though I have enabled them now).

@github-actions
Copy link

github-actions bot commented Dec 1, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants