Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad 0.9 - system job failed to start after nomad restart #5611

Closed
jozef-slezak opened this issue Apr 25, 2019 · 2 comments
Closed

Nomad 0.9 - system job failed to start after nomad restart #5611

jozef-slezak opened this issue Apr 25, 2019 · 2 comments

Comments

@jozef-slezak
Copy link

jozef-slezak commented Apr 25, 2019

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

Nomad v0.9.0 (18dd590)

Operating system and Environment details

Centos 7

Issue

Reproduction steps

Repeat nomad restarts until system job does not start

Job file (if appropriate)

addresses = {
  http = "0.0.0.0"
}
advertise = {
  http = "192.168.56.25"
  rpc = "192.168.56.25"
  serf = "192.168.56.25"
}
bind_addr = "192.168.56.25"
client = {
  enabled = true
  network_interface = "eth1"
  options = {
    driver.raw_exec.enable = 1
    driver.raw_exec.no_cgroups = 1
    fingerprint.network.disallow_link_local = true
  }
}
data_dir = "/var/lib/nomad"
datacenter = "dc1"
disable_update_check = true
log_level = "INFO"
server = {
  bootstrap_expect = 1
  enabled = true
  encrypt = "<redacted>"
}

Nomad Client logs (if appropriate)

Apr 25 12:07:55 abis-perftest.ba.innovatrics.net systemd[1]: Started HashiCorp Nomad.
Apr 25 12:07:55 abis-perftest.ba.innovatrics.net nomad[3092]: WARNING: keyring exists but -encrypt given, using keyring
Apr 25 12:07:55 abis-perftest.ba.innovatrics.net nomad[3092]: ==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
Apr 25 12:07:55 abis-perftest.ba.innovatrics.net nomad[3092]: ==> Loaded configuration from /etc/nomad.d/nomad.hcl
Apr 25 12:07:55 abis-perftest.ba.innovatrics.net nomad[3092]: ==> Starting Nomad agent...
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: ==> Nomad agent configuration:
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: Advertise Addrs: HTTP: 192.168.56.25:4646; RPC: 192.168.56.25:4647; Serf: 192.168.56.25:4648
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: Bind Addrs: HTTP: 0.0.0.0:4646; RPC: 192.168.56.25:4647; Serf: 192.168.56.25:4648
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: Client: true
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: Log Level: INFO
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: Region: global (DC: dc1)
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: Server: true
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: Version: 0.9.0
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: ==> Nomad agent started! Log data will stream in below:
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.607Z [WARN ] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/var/lib/nomad/plugins
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.619Z [INFO ] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.619Z [INFO ] agent: detected plugin: name=java type=driver plugin_version=0.1.0
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.619Z [INFO ] agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.619Z [INFO ] agent: detected plugin: name=rkt type=driver plugin_version=0.1.0
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.619Z [INFO ] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.619Z [INFO ] agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.619Z [INFO ] agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.759Z [INFO ] nomad: raft: Initial configuration (index=1): [{Suffrage:Voter ID:192.168.56.25:4647 Address:192.168.56.25:4647}]
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.759Z [INFO ] nomad: raft: Node at 192.168.56.25:4647 [Follower] entering Follower state (Leader: "")
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.765Z [INFO ] nomad: serf: EventMemberJoin: abis-perftest.ba.innovatrics.net.global 192.168.56.25
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.765Z [INFO ] nomad: starting scheduling worker(s): num_workers=2 schedulers="[service batch system _core]"
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.770Z [WARN ] nomad: serf: Failed to re-join any previously known node
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.770Z [INFO ] nomad: adding server: server="abis-perftest.ba.innovatrics.net.global (Addr: 192.168.56.25:4647) (DC: dc1)"
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.770Z [INFO ] client: using state directory: state_dir=/var/lib/nomad/client
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.772Z [INFO ] client: using alloc directory: alloc_dir=/var/lib/nomad/alloc
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.775Z [ERROR] nomad: error looking up Nomad servers in Consul: error="server.nomad: unable to query Consul datacenters: Get http://127.0.0.1:8500/v1/catalog/datacenters: dial tcp 127.0.0.1:8500: connect: connection refused"
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.777Z [INFO ] client.fingerprint_mgr.cgroup: cgroups are available
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:55.785Z [INFO ] client.fingerprint_mgr.consul: consul agent is available
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:57.744Z [WARN ] nomad: raft: Heartbeat timeout from "" reached, starting election
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:57.744Z [INFO ] nomad: raft: Node at 192.168.56.25:4647 [Candidate] entering Candidate state in term 5
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:57.760Z [INFO ] nomad: raft: Election won. Tally: 1
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:57.760Z [INFO ] nomad: raft: Node at 192.168.56.25:4647 [Leader] entering Leader state
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:57.760Z [INFO ] nomad: cluster leadership acquired
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:59.801Z [INFO ] client.plugin: starting plugin manager: plugin-type=driver
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:59.801Z [INFO ] client.plugin: starting plugin manager: plugin-type=device
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:59.802Z [INFO ] client: started client: node_id=cfecb47e-c6b5-c4bd-7c17-90e6084abd93
Apr 25 12:07:59 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:07:59.818Z [INFO ] client: node registration complete
Apr 25 12:08:07 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:08:07.781Z [INFO ] client: node registration complete
Apr 25 12:15:03 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:15:03.648Z [INFO ] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent path=/var/lib/nomad/alloc/c77d112d-4f15-f30b-f33b-faa64eff68cf/alloc/logs/.jaeger-agent.stdout.fifo @module=logmon timestamp=2019-04-25T12:15:03.648Z
Apr 25 12:15:03 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:15:03.648Z [INFO ] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent @module=logmon path=/var/lib/nomad/alloc/c77d112d-4f15-f30b-f33b-faa64eff68cf/alloc/logs/.jaeger-agent.stderr.fifo timestamp=2019-04-25T12:15:03.648Z
Apr 25 12:15:03 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:15:03.653Z [INFO ] client.driver_mgr.raw_exec: starting task: driver=raw_exec driver_cfg="{Command:/bin/bash Args:[-c exec &> >(tee -i -a /var/log/jaeger/jaeger-agent-192.168.56.25-6831.log); exec /usr/bin/jaeger-agent --reporter.tchannel.host-port=jaeger-collector.service.consul:14267]}"
Apr 25 12:18:00 abis-perftest.ba.innovatrics.net systemd[1]: Stopping HashiCorp Nomad...
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: ==> Caught signal: terminated
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:00.990Z [INFO ] agent: requesting shutdown
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:00.990Z [INFO ] client: shutting down
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.014Z [INFO ] client.plugin: shutting down plugin manager: plugin-type=device
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.020Z [INFO ] client.plugin: plugin manager finished: plugin-type=device
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.020Z [INFO ] client.plugin: shutting down plugin manager: plugin-type=driver
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.117Z [INFO ] client.plugin: plugin manager finished: plugin-type=driver
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.120Z [INFO ] nomad: shutting down server
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.120Z [WARN ] nomad: serf: Shutdown without a Leave
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.124Z [WARN ] consul.sync: failed to update services in Consul: error="error querying Consul services: Get http://127.0.0.1:8500/v1/agent/services: dial tcp 127.0.0.1:8500: connect: connection refused"
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.125Z [ERROR] consul.sync: failed deregistering agent service: service_id=_nomad-server-3l5lje6cw7be7qidooinb4k4zms2obsy error="Put http://127.0.0.1:8500/v1/agent/service/deregister/_nomad-server-3l5lje6cw7be7qidooinb4k4zms2obsy: dial tcp 127.0.0.1:8500: connect: connection refused"
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.125Z [ERROR] consul.sync: failed deregistering agent service: service_id=_nomad-client-cpo2gq4nkox6yn5jmrhov6jji7afzrzr error="Put http://127.0.0.1:8500/v1/agent/service/deregister/_nomad-client-cpo2gq4nkox6yn5jmrhov6jji7afzrzr: dial tcp 127.0.0.1:8500: connect: connection refused"
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.125Z [ERROR] consul.sync: failed deregistering agent service: service_id=_nomad-server-dbmrlnjpeme7qmpeode77ohw2xachrkr error="Put http://127.0.0.1:8500/v1/agent/service/deregister/_nomad-server-dbmrlnjpeme7qmpeode77ohw2xachrkr: dial tcp 127.0.0.1:8500: connect: connection refused"
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.125Z [ERROR] consul.sync: failed deregistering agent service: service_id=_nomad-server-josrbuw6bojgiauqouwvsrpr7dxqwj5s error="Put http://127.0.0.1:8500/v1/agent/service/deregister/_nomad-server-josrbuw6bojgiauqouwvsrpr7dxqwj5s: dial tcp 127.0.0.1:8500: connect: connection refused"
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.125Z [ERROR] consul.sync: failed deregistering agent check: check_id=d706eaa344c08a8c9e3ce51207103ac62e54a17f error="Put http://127.0.0.1:8500/v1/agent/check/deregister/d706eaa344c08a8c9e3ce51207103ac62e54a17f: dial tcp 127.0.0.1:8500: connect: connection refused"
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.125Z [ERROR] consul.sync: failed deregistering agent check: check_id=25c101eca4a54693ee3cd2288831b4e3b068b499 error="Put http://127.0.0.1:8500/v1/agent/check/deregister/25c101eca4a54693ee3cd2288831b4e3b068b499: dial tcp 127.0.0.1:8500: connect: connection refused"
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.125Z [ERROR] consul.sync: failed deregistering agent check: check_id=435becdcdb27050ba00749e7f8c9f50beb843cfa error="Put http://127.0.0.1:8500/v1/agent/check/deregister/435becdcdb27050ba00749e7f8c9f50beb843cfa: dial tcp 127.0.0.1:8500: connect: connection refused"
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.125Z [ERROR] consul.sync: failed deregistering agent check: check_id=c17f247e09a9dc1c2515144553f5d7fef2a88e65 error="Put http://127.0.0.1:8500/v1/agent/check/deregister/c17f247e09a9dc1c2515144553f5d7fef2a88e65: dial tcp 127.0.0.1:8500: connect: connection refused"
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net nomad[3092]: 2019-04-25T12:18:01.125Z [INFO ] agent: shutdown complete
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net systemd[1]: nomad.service: main process exited, code=exited, status=1/FAILURE
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net systemd[1]: Stopped HashiCorp Nomad.
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net systemd[1]: Unit nomad.service entered failed state.
Apr 25 12:18:01 abis-perftest.ba.innovatrics.net systemd[1]: nomad.service failed.
-- Reboot --
Apr 25 12:18:18 abis-perftest.ba.innovatrics.net systemd[1]: Started HashiCorp Nomad.
Apr 25 12:18:18 abis-perftest.ba.innovatrics.net nomad[3086]: WARNING: keyring exists but -encrypt given, using keyring
Apr 25 12:18:18 abis-perftest.ba.innovatrics.net nomad[3086]: ==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
Apr 25 12:18:18 abis-perftest.ba.innovatrics.net nomad[3086]: ==> Loaded configuration from /etc/nomad.d/nomad.hcl
Apr 25 12:18:18 abis-perftest.ba.innovatrics.net nomad[3086]: ==> Starting Nomad agent...
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: ==> Nomad agent configuration:
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: Advertise Addrs: HTTP: 192.168.56.25:4646; RPC: 192.168.56.25:4647; Serf: 192.168.56.25:4648
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: Bind Addrs: HTTP: 0.0.0.0:4646; RPC: 192.168.56.25:4647; Serf: 192.168.56.25:4648
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: Client: true
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: Log Level: INFO
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: Region: global (DC: dc1)
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: Server: true
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: Version: 0.9.0
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: ==> Nomad agent started! Log data will stream in below:
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.624Z [WARN ] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/var/lib/nomad/plugins
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.633Z [INFO ] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.633Z [INFO ] agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.633Z [INFO ] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.633Z [INFO ] agent: detected plugin: name=java type=driver plugin_version=0.1.0
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.633Z [INFO ] agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.633Z [INFO ] agent: detected plugin: name=rkt type=driver plugin_version=0.1.0
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.633Z [INFO ] agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.787Z [INFO ] nomad: raft: Initial configuration (index=1): [{Suffrage:Voter ID:192.168.56.25:4647 Address:192.168.56.25:4647}]
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.787Z [INFO ] nomad: raft: Node at 192.168.56.25:4647 [Follower] entering Follower state (Leader: "")
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.793Z [INFO ] nomad: serf: EventMemberJoin: abis-perftest.ba.innovatrics.net.global 192.168.56.25
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.793Z [INFO ] nomad: starting scheduling worker(s): num_workers=2 schedulers="[service batch system _core]"
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.795Z [WARN ] nomad: serf: Failed to re-join any previously known node
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.796Z [INFO ] client: using state directory: state_dir=/var/lib/nomad/client
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.799Z [INFO ] client: using alloc directory: alloc_dir=/var/lib/nomad/alloc
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.802Z [INFO ] nomad: adding server: server="abis-perftest.ba.innovatrics.net.global (Addr: 192.168.56.25:4647) (DC: dc1)"
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.802Z [ERROR] nomad: error looking up Nomad servers in Consul: error="server.nomad: unable to query Consul datacenters: Get http://127.0.0.1:8500/v1/catalog/datacenters: dial tcp 127.0.0.1:8500: connect: connection refused"
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:18.804Z [INFO ] client.fingerprint_mgr.cgroup: cgroups are available
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:20.028Z [WARN ] nomad: raft: Heartbeat timeout from "" reached, starting election
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:20.028Z [INFO ] nomad: raft: Node at 192.168.56.25:4647 [Candidate] entering Candidate state in term 6
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:20.031Z [INFO ] nomad: raft: Election won. Tally: 1
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:20.031Z [INFO ] nomad: raft: Node at 192.168.56.25:4647 [Leader] entering Leader state
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:20.031Z [INFO ] nomad: cluster leadership acquired
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:24.462Z [INFO ] client.plugin: starting plugin manager: plugin-type=driver
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:24.463Z [INFO ] client.plugin: starting plugin manager: plugin-type=device
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:24.481Z [ERROR] client.driver_mgr.raw_exec: failed to reattach to executor: driver=raw_exec error="error creating rpc client for executor plugin: Reattachment process not found" task_id=c77d112d-4f15-f30b-f33b-faa64eff68cf/jaeger-agent/17ef5455
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:24.481Z [ERROR] client.alloc_runner.task_runner: error recovering task; cleaning up: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent error="failed to reattach to executor: error creating rpc client for executor plugin: Reattachment process not found" task_id=c77d112d-4f15-f30b-f33b-faa64eff68cf/jaeger-agent/17ef5455
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:24.481Z [INFO ] client: started client: node_id=cfecb47e-c6b5-c4bd-7c17-90e6084abd93
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:24.485Z [ERROR] client.alloc_runner.task_runner.task_hook: failed to launch logmon process: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent error="Reattachment process not found"
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:24.495Z [ERROR] client.alloc_runner.task_runner: prestart failed: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent error="prestart hook "logmon" failed: Reattachment process not found"
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:24.495Z [INFO ] client.alloc_runner.task_runner: restarting task: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent reason="Restart within policy" delay=15.124781637s
Apr 25 12:18:24 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:24.507Z [INFO ] client: node registration complete
Apr 25 12:18:26 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:26.282Z [WARN ] consul.sync: failed to update services in Consul: error="Put http://127.0.0.1:8500/v1/agent/service/deregister/_nomad-task-tndslfv5j2qh2wr5inepk6k7o254av2f: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
Apr 25 12:18:28 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:28.317Z [INFO ] consul.sync: successfully updated services in Consul
Apr 25 12:18:32 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:32.739Z [INFO ] client: node registration complete
Apr 25 12:18:35 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:35.455Z [INFO ] client.fingerprint_mgr.consul: consul agent is available
Apr 25 12:18:39 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:39.650Z [ERROR] client.alloc_runner.task_runner.task_hook: failed to launch logmon process: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent error="Reattachment process not found"
Apr 25 12:18:39 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:39.659Z [ERROR] client.alloc_runner.task_runner: prestart failed: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent error="prestart hook "logmon" failed: Reattachment process not found"
Apr 25 12:18:39 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:39.659Z [INFO ] client.alloc_runner.task_runner: restarting task: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent reason="Restart within policy" delay=15.506007501s
Apr 25 12:18:40 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:40.588Z [INFO ] client: node registration complete
Apr 25 12:18:55 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:55.169Z [ERROR] client.alloc_runner.task_runner.task_hook: failed to launch logmon process: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent error="Reattachment process not found"
Apr 25 12:18:55 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:55.173Z [ERROR] client.alloc_runner.task_runner: prestart failed: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent error="prestart hook "logmon" failed: Reattachment process not found"
Apr 25 12:18:55 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:55.173Z [INFO ] client.alloc_runner.task_runner: not restarting task: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf task=jaeger-agent reason="Exceeded allowed attempts 2 in interval 30m0s and mode is "fail""
Apr 25 12:18:55 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:18:55.177Z [INFO ] client.gc: marking allocation for GC: alloc_id=c77d112d-4f15-f30b-f33b-faa64eff68cf
Apr 25 12:19:43 abis-perftest.ba.innovatrics.net nomad[3086]: 2019-04-25T12:19:43.903Z [ERROR] http: request failed: method=GET path=/v1/namespaces error="Nomad Enterprise only endpoint" code=501

Please link to your Github issue in the email and reference it in the subject
line:

To: nomad-oss-debug@hashicorp.com

Subject: GH-1234: Errors garbage collecting allocs

Emails sent to that address are readable by all HashiCorp employees but are not publicly visible.

Nomad Server logs (if appropriate)

@endocrimes
Copy link
Contributor

Hey @jozef-slezak, sorry about this. It was caused by systemd terminating the nomad logmon process, which we've fixed recovery of in nomad 0.9.1-rc1 as part of #5577.

You might also want to update your nomad service definition to use KillMode=process to avoid terminating monitoring processes and potentially loosing logs etc.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants