Platform specific models #99584

maxhniebergall · 2023-09-14T15:18:16Z

Adding support for platform specific models

…f TrainedModelMetadata

…hitecture

elasticsearchmachine · 2023-09-14T15:20:00Z

Hi @maxhniebergall, I've created a changelog YAML for you.

…atform specific

…es among ML nodes and refactoring to support this

…bergall/elasticsearch into platform-specific-models

…platform-specific-models

...ain/java/org/elasticsearch/xpack/ml/inference/deployment/SupportedPlatformArchitectures.java

...ugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java

...va/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentClusterService.java

A few bug fixes from Dave R Co-authored-by: David Roberts <dave.roberts@elastic.co>

server/src/main/java/org/elasticsearch/TransportVersions.java

...va/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentClusterService.java

...ain/java/org/elasticsearch/xpack/ml/inference/deployment/SupportedPlatformArchitectures.java

…tput from dataframe analytics

…bergall/elasticsearch into platform-specific-models

maxhniebergall · 2023-09-28T03:20:08Z

The failure in elasticsearch-ci/part-2 seems to be unrelated. I was able to reproduce it on my local machine with the reproduce command, but not without the particular parameters. I will create an issue for this.

REPRODUCE WITH: ./gradlew ':x-pack:plugin:esql:compute:test' --tests "org.elasticsearch.compute.operator.ProjectOperatorTests.testProjection" -Dtests.seed=D7AC53920B72C687 -Dtests.locale=sk -Dtests.timezone=Europe/Samara -Druntime.java=21	

org.elasticsearch.compute.operator.ProjectOperatorTests > testProjection FAILED	
    java.lang.AssertionError: java.lang.IllegalStateException: can't release already released block [IntVectorBlock[vector=ConstantIntVector[positions=5, value=3]]]

maxhniebergall · 2023-09-28T04:01:32Z

https://gradle-enterprise.elastic.co/s/arrn2n6mkbcgc

In the stack trace:
1> java.lang.IllegalStateException: Future got interrupted
org.elasticsearch.xpack.ml.inference.assignment.TrainedModelAssignmentNodeService.loadQueuedModels(TrainedModelAssignmentNodeService.java:210) ~[main/:?]

on that line:

deploymentManager.startDeployment(loadingTask, listener);                
TrainedModelDeploymentTask deployedTask = listener.actionGet();

due to
2> org.elasticsearch.ElasticsearchStatusException: Starting deployment timed out after [30s]

…ture

...va/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentClusterService.java

x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/ml/inference_crud.yml

…ously Co-authored-by: David Roberts <dave.roberts@elastic.co>

...ugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java

… to take a TrainedModelConfig actionListener (rather than void) that always triggers

droberts195

LGTM

Now this is passing CI let's get it merged unless somebody else can see something really bad.

Any nits can be resolved in a followup PR.

One such followup should be removing the hack the looks at the model name to determine if it's Linux x86.

maxhniebergall · 2023-09-28T16:49:52Z

I confirmed with manual testing that the headerwarning shows up on put trained model, and the backend refuses to start platform specific models.

* Added platform architecture field to TrainedModelMetadata and users of TrainedModelMetadata * Added TransportVersions guarding for TrainedModelMetadata * Prevent platform-specific models from being deployed on the wrong architecture * Added logic to only verify node architectures for models which are platform specific * Handle null platform architecture * Added logging for the detection of heterogeneous platform architectures among ML nodes and refactoring to support this * Added platform architecture field to TrainedModelConfig * Stop platform-speficic model when rebalance occurs and the cluster has a heterogeneous architecture among ML nodes * Added logic to TransportPutTrainedModelAction to return a warning response header when the model is paltform-specific and cannot be depoloyed on the cluster at that time due to heterogenous architectures among ML nodes * Added MlPlatformArchitecturesUtilTests * Updated Create Trained Models API docs to describe the new platform_architecture optional field. * Updated/incremented InferenceIndexConstants * Added special override to make models with linux-x86_64 in the model ID to be platform specific

## Summary Adds support for ELSER v2 download from the Trained Models UI. - Marks an appropriate model version for the current cluster configuration with the recommended flag. - Updates the state column with better human-readable labels and colour indicators. - Adds a callout promoting a new version of ELSER <img width="1686" alt="image" src="https://github.com/elastic/kibana/assets/5236598/0deea53a-6d37-4af6-97bc-9f46e36f113b"> #### Notes for reviews - We need to wait for elastic/elasticsearch#99584 to get the start deployment validation functionality. At the moment you can successfully start deployment of the wrong model version. ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [x] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl= - [x] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server)) - [x] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers)

Adds the new platform_architecture field from elastic#99584 to the package config used when downloading Elastic models from GCS.

Adds the new platform_architecture field from #99584 to the package config used when downloading Elastic models from GCS.

maxhniebergall added 4 commits September 13, 2023 15:47

Added platform architecture field to TrainedModelMetadata and users o…

492ce0e

…f TrainedModelMetadata

Added TransportVersions guarding for TrainedModelMetadata

8f8df07

Updated mutateInstanceForVersion for TrainedModelMetadataTests

5ee24fb

Prevent platform-specific models from being deployed on the wrong arc…

1bfd80f

…hitecture

elasticsearchmachine added the v8.11.0 label Sep 14, 2023

maxhniebergall added cloud-deploy Publish cloud docker image for Cloud-First-Testing >enhancement :ml Machine learning labels Sep 14, 2023

maxhniebergall self-assigned this Sep 14, 2023

maxhniebergall and others added 9 commits September 14, 2023 11:20

Update docs/changelog/99584.yaml

707e50f

Merge branch 'main' into platform-specific-models

322152f

Added logic to only verify node architectures for models which are pl…

1ae995f

…atform specific

Handle null metadata

ade5d83

Refactored node architecture detection

d43c863

Added logging for the detection of heterogeneous platform architectur…

ba39bed

…es among ML nodes and refactoring to support this

Merge branch 'elastic:main' into platform-specific-models

452aaa4

Merge branch 'platform-specific-models' of https://github.com/maxhnie…

343cb38

…bergall/elasticsearch into platform-specific-models

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

abe75be

…platform-specific-models

droberts195 reviewed Sep 19, 2023

View reviewed changes

...ain/java/org/elasticsearch/xpack/ml/inference/deployment/SupportedPlatformArchitectures.java Outdated Show resolved Hide resolved

droberts195 reviewed Sep 19, 2023

View reviewed changes

maxhniebergall and others added 2 commits September 19, 2023 10:53

Apply suggestions from draft code review

8336ec5

A few bug fixes from Dave R Co-authored-by: David Roberts <dave.roberts@elastic.co>

some small code fixes

37f9b6a

droberts195 reviewed Sep 20, 2023

View reviewed changes

Removed changes to TrainedModelMetadata as that class is merely an ou…

f04a50a

…tput from dataframe analytics

maxhniebergall closed this Sep 20, 2023

maxhniebergall force-pushed the platform-specific-models branch from 8336ec5 to b6747b4 Compare September 20, 2023 14:58

maxhniebergall added 3 commits September 20, 2023 11:00

TransportVersions merge

bb5ba35

Merge branch 'platform-specific-models' of https://github.com/maxhnie…

d4534a5

…bergall/elasticsearch into platform-specific-models

Added platform architecture field to TrainedModelConfig

03b18f1

maxhniebergall added 2 commits September 27, 2023 15:02

Merge branch 'platform-specific-models' of https://github.com/maxhnie…

a956715

…bergall/elasticsearch into platform-specific-models

Potential fix for rest tests

b65fd43

maxhniebergall added 4 commits September 28, 2023 00:02

Update to deployment manager startDeployment to verify model architec…

7783aa7

…ture

spotlessApply

f105059

more improvements for DeploymentManager

57f2236

spotlessApply

a445d1e

droberts195 reviewed Sep 28, 2023

View reviewed changes

...va/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentClusterService.java Show resolved Hide resolved

droberts195 reviewed Sep 28, 2023

View reviewed changes

x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/ml/inference_crud.yml Outdated Show resolved Hide resolved

Update ml/inference_crud.yml to remove unnecessary checks added previ…

ab1a471

…ously Co-authored-by: David Roberts <dave.roberts@elastic.co>

droberts195 reviewed Sep 28, 2023

View reviewed changes

...ugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java Outdated Show resolved Hide resolved

maxhniebergall and others added 3 commits September 28, 2023 11:06

Update MlPlatformArchitecturesUtil verifyMlNodesAndModelArchitectures…

f6d4a6a

… to take a TrainedModelConfig actionListener (rather than void) that always triggers

Merge branch 'elastic:main' into platform-specific-models

ab3b970

Merge branch 'main' into platform-specific-models

723b166

maxhniebergall requested review from droberts195 and jonathan-buttner September 28, 2023 16:34

droberts195 approved these changes Sep 28, 2023

View reviewed changes

maxhniebergall and others added 3 commits September 28, 2023 12:52

Merge branch 'elastic:main' into platform-specific-models

ad7a7d9

spotlessApply TransportVersions

9f39a04

Merge branch 'main' into platform-specific-models

e5508ac

maxhniebergall merged commit 7c21ce3 into elastic:main Sep 28, 2023

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Oct 3, 2023

[ML] Add platform_architecture to package config

d849398

Adds the new platform_architecture field from elastic#99584 to the package config used when downloading Elastic models from GCS.

droberts195 mentioned this pull request Oct 3, 2023

[ML] Add platform_architecture to package config #100193

Merged

droberts195 added a commit that referenced this pull request Oct 3, 2023

[ML] Add platform_architecture to package config (#100193)

8d6ded3

Adds the new platform_architecture field from #99584 to the package config used when downloading Elastic models from GCS.

maxhniebergall mentioned this pull request Oct 3, 2023

Added platform_architecture field to MlPutTrainedModelRequest to add … elastic/elasticsearch-specification#2302

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Platform specific models #99584

Platform specific models #99584

maxhniebergall commented Sep 14, 2023

elasticsearchmachine commented Sep 14, 2023

maxhniebergall commented Sep 28, 2023

maxhniebergall commented Sep 28, 2023

droberts195 left a comment

maxhniebergall commented Sep 28, 2023

Platform specific models #99584

Platform specific models #99584

Conversation

maxhniebergall commented Sep 14, 2023

elasticsearchmachine commented Sep 14, 2023

maxhniebergall commented Sep 28, 2023

maxhniebergall commented Sep 28, 2023

droberts195 left a comment

Choose a reason for hiding this comment

maxhniebergall commented Sep 28, 2023