[ML] Make inference timeout test more reliable #81094

davidkyle · 2021-11-29T12:03:59Z

#81091 shows that PyTorchModelIT::testEvaluateWithMinimalTimeout is not reliable as the timeout does not always occur.
The test can be made robust by relaxing the assertion to say that if an error occurs it must be a timeout error.

This also changes the HTTP status code from too many requests (429) to request timeout (408)

Closes #81091

elasticmachine · 2021-11-29T12:04:03Z

Pinging @elastic/ml-core (Team:ML)

dimitris-athanasiou

LGTM

benwtrent · 2021-11-29T12:42:43Z

...ugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java

@@ -297,7 +297,7 @@ void onTimeout() {
            if (notified.compareAndSet(false, true)) {
                processContext.getResultProcessor().ignoreResposeWithoutNotifying(String.valueOf(requestId));
                listener.onFailure(
-                    new ElasticsearchStatusException("timeout [{}] waiting for inference result", RestStatus.TOO_MANY_REQUESTS, timeout)
+                    new ElasticsearchStatusException("timeout [{}] waiting for inference result", RestStatus.REQUEST_TIMEOUT, timeout)


❤️

Good change. This is left over from before we had queueing back-pressure as the signal for too many requests.

elasticsearchmachine · 2021-11-29T13:09:05Z

💚 Backport successful

Status	Branch	Result
✅	8.0

* upstream/master: (150 commits) Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864) Optimize DLS bitset building for matchAll query (elastic#81030) URL option for BaseRunAsSuperuserCommand (elastic#81025) Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942) Fix shadowed vars pt7 (elastic#80996) Fail shards early when we can detect a type missmatch (elastic#79869) Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096) Clarify `unassigned.reason` docs (elastic#81017) Strip blocks from settings for reindex targets (elastic#80887) Split off the values supplier for ScriptDocValues (elastic#80635) [ML] Switch message and detail for model snapshot deprecations (elastic#81108) [DOCS] Update xrefs for snapshot restore docs (elastic#81023) [ML] Updates visiblity of validate API (elastic#81061) Track histogram of transport handling times (elastic#80581) [ML] Fix datafeed preview with remote indices (elastic#81099) [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060) [ML] Add logging for failing PyTorch test (elastic#81044) Extending the timeout waiting for snapshot to be ready (elastic#81018) [ML] Fix incorrect logging of unexpected model size error (elastic#81089) [ML] Make inference timeout test more reliable (elastic#81094) ... # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java

* upstream/master: (55 commits) Fix ComposableIndexTemplate equals when composed_of is null (elastic#80864) Optimize DLS bitset building for matchAll query (elastic#81030) URL option for BaseRunAsSuperuserCommand (elastic#81025) Less Verbose Serialization of Snapshot Failure in SLM Metadata (elastic#80942) Fix shadowed vars pt7 (elastic#80996) Fail shards early when we can detect a type missmatch (elastic#79869) Delegate Ref Counting to ByteBuf in Netty Transport (elastic#81096) Clarify `unassigned.reason` docs (elastic#81017) Strip blocks from settings for reindex targets (elastic#80887) Split off the values supplier for ScriptDocValues (elastic#80635) [ML] Switch message and detail for model snapshot deprecations (elastic#81108) [DOCS] Update xrefs for snapshot restore docs (elastic#81023) [ML] Updates visiblity of validate API (elastic#81061) Track histogram of transport handling times (elastic#80581) [ML] Fix datafeed preview with remote indices (elastic#81099) [ML] Fix acceptable model snapshot versions in ML deprecation checker (elastic#81060) [ML] Add logging for failing PyTorch test (elastic#81044) Extending the timeout waiting for snapshot to be ready (elastic#81018) [ML] Fix incorrect logging of unexpected model size error (elastic#81089) [ML] Make inference timeout test more reliable (elastic#81094) ...

Make inference timeout test more reliable

596c623

davidkyle added >test Issues or PRs that are addressing/adding tests :ml Machine learning v8.0.0 auto-backport-and-merge v8.1.0 labels Nov 29, 2021

elasticmachine added the Team:ML Meta label for the ML team label Nov 29, 2021

Splotless

a8fbe13

dimitris-athanasiou approved these changes Nov 29, 2021

View reviewed changes

benwtrent approved these changes Nov 29, 2021

View reviewed changes

davidkyle merged commit 92b6b6f into elastic:master Nov 29, 2021

davidkyle mentioned this pull request Nov 29, 2021

[8.0] [ML] Make inference timeout test more reliable (#81094) #81097

Merged

davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Nov 29, 2021

[ML] Make inference timeout test more reliable (elastic#81094)

1a7cbe7

davidkyle deleted the assume-timeout branch November 29, 2021 13:09

elasticsearchmachine pushed a commit that referenced this pull request Nov 29, 2021

[ML] Make inference timeout test more reliable (#81094) (#81097)

c5a3eb4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Make inference timeout test more reliable #81094

[ML] Make inference timeout test more reliable #81094

davidkyle commented Nov 29, 2021

elasticmachine commented Nov 29, 2021

dimitris-athanasiou left a comment

benwtrent Nov 29, 2021

elasticsearchmachine commented Nov 29, 2021

[ML] Make inference timeout test more reliable #81094

[ML] Make inference timeout test more reliable #81094

Conversation

davidkyle commented Nov 29, 2021

elasticmachine commented Nov 29, 2021

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

benwtrent Nov 29, 2021

Choose a reason for hiding this comment

elasticsearchmachine commented Nov 29, 2021

💚 Backport successful