Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious error messages when tasks exit successfully #10814

Closed
schmichael opened this issue Jun 25, 2021 · 4 comments · Fixed by #11273
Closed

Spurious error messages when tasks exit successfully #10814

schmichael opened this issue Jun 25, 2021 · 4 comments · Fixed by #11273

Comments

@schmichael
Copy link
Member

When running a batch job that exits successfully, RPC error messages are still emitted in the client agent logs:

[ERROR] client.driver_mgr.exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=eb3a3678-9053-198a-118e-0b4fbf61256a driver=exec task_name=echo error="rpc error: code = Unavailable desc = transport is closing"

Nomad Version

Most recently 1.1.2 and 0.12.11, but similar reports go back to 0.10 and 0.9.

References

Jobspec

Should be able to reproduce with any batch job that exits successfully:

job "echo" {
	datacenters = ["dc1"]
	type = "batch"

	group "cache" {
		task "echo" {
			driver = "exec"
			config {
  				command = "/bin/sleep"
    				args    = ["1"]
			}
		}
	}
}
@anastazya
Copy link

I can confirm the same messages when running lots of RawExec jobs.

@jseba
Copy link

jseba commented Sep 1, 2021

We're hitting this as well with our plugin, using Nomad version 1.0.4. From our logs, we see this message over 7000 times in the last 12 hours (as of this message), all of them with the same error code, "task not found for given id" (ErrTaskNotFound). However, not all of the jobs are type = "batch", some of them are service jobs.

@T0tt1
Copy link

T0tt1 commented Sep 17, 2021

Do we have any progress here? I can report the same behaviour and searching for solution. Nomad v1.0.4


/bin/bash mkdir <my_desired_folder>


@message | {"@level":"error","@message":"error receiving stream from Stats executor RPC, closing stream","@module":"client.driver_mgr.raw_exec","@timestamp":"2021-09-17T04:06:37.101555Z","alloc_id":"#######-####-####-####-############","driver":"raw_exec","error":"rpc error: code = Unavailable desc = transport is closing","task_name":"qaz-wsx-edc"}

notnoop pushed a commit that referenced this issue Oct 6, 2021
Suppress stats streaming error log messages when task finishes.
Streaming errors are expected when a task finishes and they aren't
actionable to users.

Also, note that the task runner Stats hook retries collecting stats
after a delay. If the connection terminates prematurely, it will be
retried, and closing the stats stream is not very disruptive.

Ideally, executor terminates cleanly when task exits, but that's a more
substantial change that may require changing the executor/drivers interface.

Fixes #10814
lgfa29 pushed a commit that referenced this issue Nov 15, 2021
Suppress stats streaming error log messages when task finishes.
Streaming errors are expected when a task finishes and they aren't
actionable to users.

Also, note that the task runner Stats hook retries collecting stats
after a delay. If the connection terminates prematurely, it will be
retried, and closing the stats stream is not very disruptive.

Ideally, executor terminates cleanly when task exits, but that's a more
substantial change that may require changing the executor/drivers interface.

Fixes #10814
lgfa29 pushed a commit that referenced this issue Nov 15, 2021
Suppress stats streaming error log messages when task finishes.
Streaming errors are expected when a task finishes and they aren't
actionable to users.

Also, note that the task runner Stats hook retries collecting stats
after a delay. If the connection terminates prematurely, it will be
retried, and closing the stats stream is not very disruptive.

Ideally, executor terminates cleanly when task exits, but that's a more
substantial change that may require changing the executor/drivers interface.

Fixes #10814
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants