Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

channel close should not throw a RuntimeException #9362

Closed
BrunoChevalier opened this issue Sep 10, 2019 · 7 comments
Closed

channel close should not throw a RuntimeException #9362

BrunoChevalier opened this issue Sep 10, 2019 · 7 comments
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team

Comments

@BrunoChevalier
Copy link

BrunoChevalier commented Sep 10, 2019

Description of the problem / feature request:

We are seeing the following error:

Internal error thrown during build. Printing stack trace: java.lang.RuntimeException: Unrecoverable error while evaluating node 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-sources BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=0}' (requested by nodes 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:l2forwarding_logic-cpp BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=1}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=1}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=2}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=3}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=4}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=5}', ...)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:528)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:399)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: Self-suppression not permitted
	at java.base/java.lang.Throwable.addSuppressed(Unknown Source)
	at com.google.devtools.build.lib.remote.AbstractRemoteActionCache.download(AbstractRemoteActionCache.java:237)
	at com.google.devtools.build.lib.remote.RemoteSpawnCache.lookup(RemoteSpawnCache.java:171)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:118)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:88)
	at com.google.devtools.build.lib.actions.SpawnActionContext.beginExecution(SpawnActionContext.java:41)
	at com.google.devtools.build.lib.exec.ProxySpawnActionContext.beginExecution(ProxySpawnActionContext.java:60)
	at com.google.devtools.build.lib.actions.SpawnContinuation$1.execute(SpawnContinuation.java:80)
	at com.google.devtools.build.lib.analysis.actions.SpawnAction$SpawnActionContinuation.execute(SpawnAction.java:1344)
	at com.google.devtools.build.lib.analysis.actions.SpawnAction.beginExecution(SpawnAction.java:314)
	at com.google.devtools.build.lib.actions.Action.execute(Action.java:123)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$4.execute(SkyframeActionExecutor.java:851)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:985)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:957)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:116)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:77)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:577)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:760)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:275)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:451)
	... 4 more
Caused by: io.netty.channel.ExtendedClosedChannelException
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)

 (05:08:26) INFO: Elapsed time: 44.246s, Critical Path: 21.72s
 (05:08:26) INFO: 2822 processes: 2818 remote cache hit, 4 processwrapper-sandbox.
 (05:08:26) FAILED: Build did NOT complete successfully
Internal error thrown during build. Printing stack trace: java.lang.RuntimeException: Unrecoverable error while evaluating node 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-sources BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=0}' (requested by nodes 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:l2forwarding_logic-cpp BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=1}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=1}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=2}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=3}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=4}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=5}', ...)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:528)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:399)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: Self-suppression not permitted
	at java.base/java.lang.Throwable.addSuppressed(Unknown Source)
	at com.google.devtools.build.lib.remote.AbstractRemoteActionCache.download(AbstractRemoteActionCache.java:237)
	at com.google.devtools.build.lib.remote.RemoteSpawnCache.lookup(RemoteSpawnCache.java:171)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:118)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:88)
	at com.google.devtools.build.lib.actions.SpawnActionContext.beginExecution(SpawnActionContext.java:41)
	at com.google.devtools.build.lib.exec.ProxySpawnActionContext.beginExecution(ProxySpawnActionContext.java:60)
	at com.google.devtools.build.lib.actions.SpawnContinuation$1.execute(SpawnContinuation.java:80)
	at com.google.devtools.build.lib.analysis.actions.SpawnAction$SpawnActionContinuation.execute(SpawnAction.java:1344)
	at com.google.devtools.build.lib.analysis.actions.SpawnAction.beginExecution(SpawnAction.java:314)
	at com.google.devtools.build.lib.actions.Action.execute(Action.java:123)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$4.execute(SkyframeActionExecutor.java:851)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:985)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:957)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:116)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:77)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:577)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:760)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:275)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:451)
	... 4 more
Caused by: io.netty.channel.ExtendedClosedChannelException
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
java.lang.RuntimeException: Unrecoverable error while evaluating node 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-sources BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=0}' (requested by nodes 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:l2forwarding_logic-cpp BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=1}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=1}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=2}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=3}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=4}', 'ActionLookupData{actionLookupKey=//vobs/dsl/sw/y/src/l2forwarding/logic:yang-headers BuildConfigurationValue.Key[6255154e24bd3a5b9ea3531bc8419613] false, actionIndex=5}', ...)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:528)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:399)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: Self-suppression not permitted
	at java.base/java.lang.Throwable.addSuppressed(Unknown Source)
	at com.google.devtools.build.lib.remote.AbstractRemoteActionCache.download(AbstractRemoteActionCache.java:237)
	at com.google.devtools.build.lib.remote.RemoteSpawnCache.lookup(RemoteSpawnCache.java:171)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:118)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:88)
	at com.google.devtools.build.lib.actions.SpawnActionContext.beginExecution(SpawnActionContext.java:41)
	at com.google.devtools.build.lib.exec.ProxySpawnActionContext.beginExecution(ProxySpawnActionContext.java:60)
	at com.google.devtools.build.lib.actions.SpawnContinuation$1.execute(SpawnContinuation.java:80)
	at com.google.devtools.build.lib.analysis.actions.SpawnAction$SpawnActionContinuation.execute(SpawnAction.java:1344)
	at com.google.devtools.build.lib.analysis.actions.SpawnAction.beginExecution(SpawnAction.java:314)
	at com.google.devtools.build.lib.actions.Action.execute(Action.java:123)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$4.execute(SkyframeActionExecutor.java:851)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:985)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:957)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:116)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:77)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:577)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:760)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:275)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:451)
	... 4 more
Caused by: io.netty.channel.ExtendedClosedChannelException
 (05:08:26) INFO: Build Event Protocol files produced successfully.
 (05:08:26) FAILED: Build did NOT complete successfully
Failed: ***

The connection to our remote cache has been closed and we are getting this RuntimeException.

Feature requests: what underlying problem are you trying to solve with this feature?

Reliable remote cache

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

We can't reliably reproduce this bug at this point in time.
I will check if killing the connection to the remote cache triggers the bug.

What operating system are you running Bazel on?

centOS 7
Linux 847cea82af63 3.10.0-957.12.1.el7.x86_64 #1 SMP Mon Apr 29 14:59:59 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

What's the output of bazel info release?

Extracting Bazel installation...
Starting local Bazel server and connecting to it...
release 0.29.0

Have you found anything relevant by searching the web?

https://groups.google.com/forum/#!topic/bazel-discuss/_IpJjDm5a2w

@irengrig irengrig added untriaged team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Sep 11, 2019
@irengrig
Copy link
Contributor

/cc @buchgr P1?

@woutervermeiren
Copy link

From the client side we are seeing this error message:

(09:26:59) WARNING: Reading from Remote Cache:
Download of '/cache/cas/44750cbb1cbb9f3679e0c93e02906080beb984751b45e504cd88df6fa1518dc0' timed out. Received 129776 of 170628 bytes.

However on server side the requests seems to be handled correctly. Our remote cache server uses the nginx + webdav approach.

"access": {
        "user_agent": {
          "name": "Other",
          "os_name": "Other",
          "device": "Other",
          "os": "Other",
          "build": ""
        },
        "time_local": "11/Sep/2019:07:26:54 +0000",
        "referrer": "-",
        "request_time": "0.001",
        "upstream_cache_status": "-",
        "method": "GET",
        "url": "/cache/cas/44750cbb1cbb9f3679e0c93e02906080beb984751b45e504cd88df6fa1518dc0",
        "http_version": "1.1",
        "body_sent": {
          "bytes": "170628"
        },
        "remote_user": "-",
        "response_code": "200"
      }

werkt pushed a commit to werkt/bazel that referenced this issue Sep 12, 2019
Exceptions that are reused between download futures must not be added to
the suppression of themselves.

Fixes bazelbuild#9362
@keith
Copy link
Member

keith commented Oct 10, 2019

We're seeing this issue with bazel 1.0 as well.

@keith
Copy link
Member

keith commented Oct 10, 2019

Does anyone know what version this bug was introduced with? We were updating from 0.23.1 and we need to downgrade to before it was introduced

@mmorearty
Copy link
Contributor

It was introduced in Bazel 0.29.0.

@dslomov
Copy link
Contributor

dslomov commented Oct 11, 2019

@buchgr please comment on severity of this. Patch-release worthy?

@buchgr
Copy link
Contributor

buchgr commented Oct 11, 2019

chatted offline. agreed to do a patch release.

@dslomov dslomov added P1 I'll work on this now. (Assignee required) release blocker and removed untriaged labels Oct 11, 2019
dslomov pushed a commit that referenced this issue Oct 11, 2019
Exceptions that are reused between download futures must not be added to
the suppression of themselves.

Fixes #9362

Closes #9376.

PiperOrigin-RevId: 270658205
dslomov pushed a commit that referenced this issue Oct 14, 2019
Exceptions that are reused between download futures must not be added to
the suppression of themselves.

Fixes #9362

Closes #9376.

PiperOrigin-RevId: 270658205
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants