Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Make inference timeout test more reliable #81094

Merged
merged 2 commits into from
Nov 29, 2021

Conversation

davidkyle
Copy link
Member

#81091 shows that PyTorchModelIT::testEvaluateWithMinimalTimeout is not reliable as the timeout does not always occur.
The test can be made robust by relaxing the assertion to say that if an error occurs it must be a timeout error.

This also changes the HTTP status code from too many requests (429) to request timeout (408)

Closes #81091

@davidkyle davidkyle added >test Issues or PRs that are addressing/adding tests :ml Machine learning v8.0.0 auto-backport-and-merge v8.1.0 labels Nov 29, 2021
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Nov 29, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -297,7 +297,7 @@ void onTimeout() {
if (notified.compareAndSet(false, true)) {
processContext.getResultProcessor().ignoreResposeWithoutNotifying(String.valueOf(requestId));
listener.onFailure(
new ElasticsearchStatusException("timeout [{}] waiting for inference result", RestStatus.TOO_MANY_REQUESTS, timeout)
new ElasticsearchStatusException("timeout [{}] waiting for inference result", RestStatus.REQUEST_TIMEOUT, timeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Good change. This is left over from before we had queueing back-pressure as the signal for too many requests.

@davidkyle davidkyle merged commit 92b6b6f into elastic:master Nov 29, 2021
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Nov 29, 2021
@davidkyle davidkyle deleted the assume-timeout branch November 29, 2021 13:09
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.0

weizijun added a commit to weizijun/elasticsearch that referenced this pull request Nov 30, 2021
* upstream/master: (150 commits)
  Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864)
  Optimize DLS bitset building for matchAll query (elastic#81030)
  URL option for BaseRunAsSuperuserCommand (elastic#81025)
  Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942)
  Fix shadowed vars pt7 (elastic#80996)
  Fail shards early when we can detect a type missmatch (elastic#79869)
  Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096)
  Clarify `unassigned.reason` docs (elastic#81017)
  Strip blocks from settings for reindex targets (elastic#80887)
  Split off the values supplier for ScriptDocValues (elastic#80635)
  [ML] Switch message and detail for model snapshot deprecations (elastic#81108)
  [DOCS] Update xrefs for snapshot restore docs (elastic#81023)
  [ML] Updates visiblity of validate API (elastic#81061)
  Track histogram of transport handling times (elastic#80581)
  [ML] Fix datafeed preview with remote indices (elastic#81099)
  [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060)
  [ML] Add logging for failing PyTorch test (elastic#81044)
  Extending the timeout waiting for snapshot to be ready (elastic#81018)
  [ML] Fix incorrect logging of unexpected model size error (elastic#81089)
  [ML] Make inference timeout test more reliable (elastic#81094)
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Nov 30, 2021
* upstream/master: (55 commits)
  Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864)
  Optimize DLS bitset building for matchAll query (elastic#81030)
  URL option for BaseRunAsSuperuserCommand (elastic#81025)
  Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942)
  Fix shadowed vars pt7 (elastic#80996)
  Fail shards early when we can detect a type missmatch (elastic#79869)
  Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096)
  Clarify `unassigned.reason` docs (elastic#81017)
  Strip blocks from settings for reindex targets (elastic#80887)
  Split off the values supplier for ScriptDocValues (elastic#80635)
  [ML] Switch message and detail for model snapshot deprecations (elastic#81108)
  [DOCS] Update xrefs for snapshot restore docs (elastic#81023)
  [ML] Updates visiblity of validate API (elastic#81061)
  Track histogram of transport handling times (elastic#80581)
  [ML] Fix datafeed preview with remote indices (elastic#81099)
  [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060)
  [ML] Add logging for failing PyTorch test (elastic#81044)
  Extending the timeout waiting for snapshot to be ready (elastic#81018)
  [ML] Fix incorrect logging of unexpected model size error (elastic#81089)
  [ML] Make inference timeout test more reliable (elastic#81094)
  ...
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Nov 30, 2021
* upstream/master: (55 commits)
  Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864)
  Optimize DLS bitset building for matchAll query (elastic#81030)
  URL option for BaseRunAsSuperuserCommand (elastic#81025)
  Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942)
  Fix shadowed vars pt7 (elastic#80996)
  Fail shards early when we can detect a type missmatch (elastic#79869)
  Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096)
  Clarify `unassigned.reason` docs (elastic#81017)
  Strip blocks from settings for reindex targets (elastic#80887)
  Split off the values supplier for ScriptDocValues (elastic#80635)
  [ML] Switch message and detail for model snapshot deprecations (elastic#81108)
  [DOCS] Update xrefs for snapshot restore docs (elastic#81023)
  [ML] Updates visiblity of validate API (elastic#81061)
  Track histogram of transport handling times (elastic#80581)
  [ML] Fix datafeed preview with remote indices (elastic#81099)
  [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060)
  [ML] Add logging for failing PyTorch test (elastic#81044)
  Extending the timeout waiting for snapshot to be ready (elastic#81018)
  [ML] Fix incorrect logging of unexpected model size error (elastic#81089)
  [ML] Make inference timeout test more reliable (elastic#81094)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning Team:ML Meta label for the ML team >test Issues or PRs that are addressing/adding tests v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI][ML] PyTorchModelIT testEvaluateWithMinimalTimeout failure
5 participants