Skip to content

Commit

Permalink
feat(vertexai): add google_vertex_ai_index for Vertex AI Matching Eng…
Browse files Browse the repository at this point in the history
…ine (#6728)

* feat: add google_vertex_ai_index for Vertex AI Matching Engine

* fix: increase timeouts to 60 min because 20 wasn't enough for creation

* fix: change coe to make name computed instead of an input

* fix: use costom flatten code to ignore_read a nested property's field

* fix: add skip_import_test: true to the auto-gen test

* feat: add a test with a manually updated ImportStateVerifyIgnore

* Apply suggestions from code review [ci skip]

Update descriptions based on the suggestions

Co-authored-by: Stephen Lewis (Burrows) <stephen.r.burrows@gmail.com>

* refactor: use ignore_read_extra instead of a manual test

* fix: use an empty object for bruteForceConfig

* feat: define additional fields to api.yaml

* feat: add an example to increase test coverage

* feat: deal with contentsDeltaUri as an updatable field

* fix: fix the error because the cosine distance type only supports unit l2 norm type

This is the error message from the endpoint:
"Index with `COSINE_DISTANCE` distanceMeasureType currently only supports `UNIT_L2_NORM` featureNormType."

* feat: remove approximate_neighbors_count from an example with brute_force_config

approximate_neighbors_count is required if tree-AH algorithm is used.
from https://cloud.google.com/vertex-ai/docs/matching-engine/configuring-indexes#brute-force-config

* test: add a handwritten test for patch

* fix: add update_mask: true to use the mask as a url param

* refactor: put 'input: true' on the fields patch couldn't update

* feat: use custom pre update code for a nested object

* fix: update the handwritten test accordingly

* feat: add custom flatten code for is_complete_overwrite

Co-authored-by: Stephen Lewis (Burrows) <stephen.r.burrows@gmail.com>
  • Loading branch information
shotarok and melinath authored Nov 28, 2022
1 parent 3f040a1 commit 5dd9723
Show file tree
Hide file tree
Showing 8 changed files with 554 additions and 1 deletion.
192 changes: 191 additions & 1 deletion mmv1/products/vertexai/api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -806,4 +806,194 @@ objects:
description: |
The disk utilization of the MetadataStore in bytes.
output: true

# Vertex AI Matching Engine Index
- !ruby/object:Api::Resource
name: Index
base_url: projects/{{project}}/locations/{{region}}/indexes
create_url: projects/{{project}}/locations/{{region}}/indexes
self_link: projects/{{project}}/locations/{{region}}/indexes/{{name}}
update_verb: :PATCH
update_mask: true
create_verb: :POST
references: !ruby/object:Api::Resource::ReferenceLinks
api: https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.indexes/
async: !ruby/object:Api::OpAsync
operation: !ruby/object:Api::OpAsync::Operation
path: 'name'
base_url: '{{op_id}}'
wait_ms: 1000
result: !ruby/object:Api::OpAsync::Result
path: 'response'
resource_inside_response: true
status: !ruby/object:Api::OpAsync::Status
path: 'done'
complete: True
allowed:
- True
- False
error: !ruby/object:Api::OpAsync::Error
path: 'error'
message: 'message'
description: |-
A representation of a collection of database items organized in a way that allows for approximate nearest neighbor (a.k.a ANN) algorithms search.
parameters:
- !ruby/object:Api::Type::String
name: region
description: The region of the Metadata Store. eg us-central1
url_param_only: true
input: true
properties:
- !ruby/object:Api::Type::String
name: 'name'
description: The resource name of the Index.
output: true
- !ruby/object:Api::Type::String
name: 'displayName'
description: The display name of the Index. The name can be up to 128 characters long and can consist of any UTF-8 characters.
required: true
- !ruby/object:Api::Type::String
name: 'description'
description: The description of the Index.
# Please take a look at the following links for the original definition:
# https://cloud.google.com/vertex-ai/docs/matching-engine/create-manage-index#create_index-drest
# https://cloud.google.com/vertex-ai/docs/matching-engine/configuring-indexes
- !ruby/object:Api::Type::NestedObject
name: 'metadata'
description: An additional information about the Index
properties:
- !ruby/object:Api::Type::String
name: 'contentsDeltaUri'
description: |-
Allows inserting, updating or deleting the contents of the Matching Engine Index.
The string must be a valid Cloud Storage directory path. If this
field is set when calling IndexService.UpdateIndex, then no other
Index field can be also updated as part of the same call.
The expected structure and format of the files this URI points to is
described at https://cloud.google.com/vertex-ai/docs/matching-engine/using-matching-engine#input-data-format
- !ruby/object:Api::Type::Boolean
name: 'isCompleteOverwrite'
description: |-
If this field is set together with contentsDeltaUri when calling IndexService.UpdateIndex,
then existing content of the Index will be replaced by the data from the contentsDeltaUri.
default_value: false
- !ruby/object:Api::Type::NestedObject
name: 'config'
input: true
description: The configuration of the Matching Engine Index.
properties:
- !ruby/object:Api::Type::Integer
name: 'dimensions'
description: The number of dimensions of the input vectors.
required: true
- !ruby/object:Api::Type::Integer
name: 'approximateNeighborsCount'
description: |-
The default number of neighbors to find via approximate search before exact reordering is
performed. Exact reordering is a procedure where results returned by an
approximate search algorithm are reordered via a more expensive distance computation.
Required if tree-AH algorithm is used.
- !ruby/object:Api::Type::String
name: 'distanceMeasureType'
description: |-
The distance measure used in nearest neighbor search. The value must be one of the followings:
* SQUARED_L2_DISTANCE: Euclidean (L_2) Distance
* L1_DISTANCE: Manhattan (L_1) Distance
* COSINE_DISTANCE: Cosine Distance. Defined as 1 - cosine similarity.
* DOT_PRODUCT_DISTANCE: Dot Product Distance. Defined as a negative of the dot product
default_value: "DOT_PRODUCT_DISTANCE"
- !ruby/object:Api::Type::String
name: 'featureNormType'
description: |-
Type of normalization to be carried out on each vector. The value must be one of the followings:
* UNIT_L2_NORM: Unit L2 normalization type
* NONE: No normalization type is specified.
default_value: "NONE"
- !ruby/object:Api::Type::NestedObject
name: 'algorithmConfig'
description: The configuration with regard to the algorithms used for efficient search.
properties:
- !ruby/object:Api::Type::NestedObject
name: 'treeAhConfig'
exactly_one_of:
- treeAhConfig
- bruteForceConfig
description: |-
Configuration options for using the tree-AH algorithm (Shallow tree + Asymmetric Hashing).
Please refer to this paper for more details: https://arxiv.org/abs/1908.10396
properties:
- !ruby/object:Api::Type::Integer
name: 'leafNodeEmbeddingCount'
description: Number of embeddings on each leaf node. The default value is 1000 if not set.
default_value: 1000
- !ruby/object:Api::Type::Integer
name: 'leafNodesToSearchPercent'
description: |-
The default percentage of leaf nodes that any query may be searched. Must be in
range 1-100, inclusive. The default value is 10 (means 10%) if not set.
default_value: 10
- !ruby/object:Api::Type::NestedObject
name: 'bruteForceConfig'
allow_empty_object: true
send_empty_value: true
properties: []
exactly_one_of:
- treeAhConfig
- bruteForceConfig
description: |-
Configuration options for using brute force search, which simply implements the
standard linear search in the database for each query.
- !ruby/object:Api::Type::String
name: 'metadataSchemaUri'
description: |-
Points to a YAML file stored on Google Cloud Storage describing additional information about the Index, that is specific to it. Unset if the Index does not have any additional information.
output: true
- !ruby/object:Api::Type::Array
name: 'deployedIndexes'
output: true
description: The pointers to DeployedIndexes created from this Index. An Index can be only deleted if all its DeployedIndexes had been undeployed first.
item_type: !ruby/object:Api::Type::NestedObject
properties:
- !ruby/object:Api::Type::String
name: 'indexEndpoint'
output: true
description: A resource name of the IndexEndpoint.
- !ruby/object:Api::Type::String
name: 'deployedIndexId'
output: true
description: The ID of the DeployedIndex in the above IndexEndpoint.
- !ruby/object:Api::Type::String
name: 'etag'
description: Used to perform consistent read-modify-write updates.
output: true
- !ruby/object:Api::Type::KeyValuePairs
name: 'labels'
description: The labels with user-defined metadata to organize your Indexes.
- !ruby/object:Api::Type::String
name: 'createTime'
output: true
description: The timestamp of when the Index was created in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits.
- !ruby/object:Api::Type::String
name: 'updateTime'
output: true
description: The timestamp of when the Index was last updated in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits.
- !ruby/object:Api::Type::NestedObject
name: 'indexStats'
output: true
description: Stats of the index resource.
properties:
- !ruby/object:Api::Type::String
name: 'vectorsCount'
output: true
description: The number of vectors in the Index.
- !ruby/object:Api::Type::Integer
name: 'shardsCount'
output: true
description: The number of shards in the Index.
- !ruby/object:Api::Type::String
name: 'indexUpdateMethod'
input: true
default_value: BATCH_UPDATE
description: |-
The update method to use with this Index. The value must be the followings. If not set, BATCH_UPDATE will be used by default.
* BATCH_UPDATE: user can call indexes.patch with files on Cloud Storage of datapoints to update.
* STREAM_UPDATE: user can call indexes.upsertDatapoints/DeleteDatapoints to update the Index and the updates will be applied in corresponding DeployedIndexes in nearly real-time.
40 changes: 40 additions & 0 deletions mmv1/products/vertexai/terraform.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,46 @@ overrides: !ruby/object:Overrides::ResourceOverrides
name: 'force_destroy'
description: 'If set to true, any EntityTypes and Features for this Featurestore will also be deleted'
default_value: false
Index: !ruby/object:Overrides::Terraform::ResourceOverride
autogen_async: false
timeouts: !ruby/object:Api::Timeouts
insert_minutes: 60
update_minutes: 60
delete_minutes: 60
examples:
- !ruby/object:Provider::Terraform::Examples
name: "vertex_ai_index"
primary_resource_id: "index"
vars:
display_name: "test-index"
bucket_name: "vertex-ai-index-test"
test_env_vars:
project: :PROJECT_NAME
ignore_read_extra:
- metadata.0.contents_delta_uri
- metadata.0.is_complete_overwrite
- !ruby/object:Provider::Terraform::Examples
name: "vertex_ai_index_streaming"
primary_resource_id: "index"
vars:
display_name: "test-index"
bucket_name: "vertex-ai-index-test"
test_env_vars:
project: :PROJECT_NAME
ignore_read_extra:
- metadata.0.contents_delta_uri
- metadata.0.is_complete_overwrite
properties:
name: !ruby/object:Overrides::Terraform::PropertyOverride
custom_flatten: templates/terraform/custom_flatten/name_from_self_link.erb
etag: !ruby/object:Overrides::Terraform::PropertyOverride
ignore_read: true
metadata.contentsDeltaUri: !ruby/object:Overrides::Terraform::PropertyOverride
custom_flatten: templates/terraform/custom_flatten/vertex_ai_index_ignore_contents_delta_uri.go.erb
metadata.isCompleteOverwrite: !ruby/object:Overrides::Terraform::PropertyOverride
custom_flatten: templates/terraform/custom_flatten/vertex_ai_index_ignore_is_complete_overwrite.go.erb
custom_code: !ruby/object:Provider::Terraform::CustomCode
pre_update: templates/terraform/pre_update/vertex_ai_index.go.erb
FeaturestoreEntitytype: !ruby/object:Overrides::Terraform::ResourceOverride
import_format: ["{{%featurestore}}/entityTypes/{{name}}"]
autogen_async: false
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<%# The license inside this block applies to this file.
# Copyright 2022 Google Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
-%>
func flatten<%= prefix -%><%= titlelize_property(property) -%>(v interface{}, d *schema.ResourceData, config *Config) interface{} {
// We want to ignore read on this field, but cannot because it is nested
return d.Get("metadata.0.contents_delta_uri")
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<%# The license inside this block applies to this file.
# Copyright 2022 Google Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
-%>
func flatten<%= prefix -%><%= titlelize_property(property) -%>(v interface{}, d *schema.ResourceData, config *Config) interface{} {
// We want to ignore read on this field, but cannot because it is nested
return d.Get("metadata.0.is_complete_overwrite")
}
40 changes: 40 additions & 0 deletions mmv1/templates/terraform/examples/vertex_ai_index.tf.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
resource "google_storage_bucket" "bucket" {
name = "<%= ctx[:test_env_vars]['project'] %>-<%= ctx[:vars]['bucket_name'] %>" # Every bucket name must be globally unique
location = "us-central1"
uniform_bucket_level_access = true
}

# The sample data comes from the following link:
# https://cloud.google.com/vertex-ai/docs/matching-engine/filtering#specify-namespaces-tokens
resource "google_storage_bucket_object" "data" {
name = "contents/data.json"
bucket = google_storage_bucket.bucket.name
content = <<EOF
{"id": "42", "embedding": [0.5, 1.0], "restricts": [{"namespace": "class", "allow": ["cat", "pet"]},{"namespace": "category", "allow": ["feline"]}]}
{"id": "43", "embedding": [0.6, 1.0], "restricts": [{"namespace": "class", "allow": ["dog", "pet"]},{"namespace": "category", "allow": ["canine"]}]}
EOF
}

resource "google_vertex_ai_index" "index" {
labels = {
foo = "bar"
}
region = "us-central1"
display_name = "<%= ctx[:vars]['display_name'] %>"
description = "index for test"
metadata {
contents_delta_uri = "gs://${google_storage_bucket.bucket.name}/contents"
config {
dimensions = 2
approximate_neighbors_count = 150
distance_measure_type = "DOT_PRODUCT_DISTANCE"
algorithm_config {
tree_ah_config {
leaf_node_embedding_count = 500
leaf_nodes_to_search_percent = 7
}
}
}
}
index_update_method = "BATCH_UPDATE"
}
37 changes: 37 additions & 0 deletions mmv1/templates/terraform/examples/vertex_ai_index_streaming.tf.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
resource "google_storage_bucket" "bucket" {
name = "<%= ctx[:test_env_vars]['project'] %>-<%= ctx[:vars]['bucket_name'] %>" # Every bucket name must be globally unique
location = "us-central1"
uniform_bucket_level_access = true
}

# The sample data comes from the following link:
# https://cloud.google.com/vertex-ai/docs/matching-engine/filtering#specify-namespaces-tokens
resource "google_storage_bucket_object" "data" {
name = "contents/data.json"
bucket = google_storage_bucket.bucket.name
content = <<EOF
{"id": "42", "embedding": [0.5, 1.0], "restricts": [{"namespace": "class", "allow": ["cat", "pet"]},{"namespace": "category", "allow": ["feline"]}]}
{"id": "43", "embedding": [0.6, 1.0], "restricts": [{"namespace": "class", "allow": ["dog", "pet"]},{"namespace": "category", "allow": ["canine"]}]}
EOF
}

resource "google_vertex_ai_index" "index" {
labels = {
foo = "bar"
}
region = "us-central1"
display_name = "<%= ctx[:vars]['display_name'] %>"
description = "index for test"
metadata {
contents_delta_uri = "gs://${google_storage_bucket.bucket.name}/contents"
config {
dimensions = 2
distance_measure_type = "COSINE_DISTANCE"
feature_norm_type = "UNIT_L2_NORM"
algorithm_config {
brute_force_config {}
}
}
}
index_update_method = "STREAM_UPDATE"
}
22 changes: 22 additions & 0 deletions mmv1/templates/terraform/pre_update/vertex_ai_index.go.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
newUpdateMask := []string{}

if d.HasChange("metadata.0.contents_delta_uri") {
// Use the current value of isCompleteOverwrite when updating contentsDeltaUri
newUpdateMask = append(newUpdateMask, "metadata.contentsDeltaUri")
newUpdateMask = append(newUpdateMask, "metadata.isCompleteOverwrite")
}

for _, mask := range updateMask {
// Use granular update masks instead of 'metadata' to avoid the following error:
// 'If `contents_delta_gcs_uri` is set as part of `index.metadata`, then no other Index fields can be also updated as part of the same update call.'
if mask == "metadata" {
continue
}
newUpdateMask = append(newUpdateMask, mask)
}

// Refreshing updateMask after adding extra schema entries
url, err = addQueryParams(url, map[string]string{"updateMask": strings.Join(newUpdateMask, ",")})
if err != nil {
return err
}
Loading

0 comments on commit 5dd9723

Please sign in to comment.