Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable support within aws_bedrockagent_knowledge_base for embedding_model_configuration and supplemental_data_storage_configuration #40737

Merged

Conversation

awgibbs
Copy link
Contributor

@awgibbs awgibbs commented Dec 31, 2024

  • extend appropriate go schema
  • add necessary go structs
  • add acceptance tests
  • extend docs as necessary
  • fix broken dash characters found in docs

Description

Last month AWS introduced binary embedding support for Amazon Titan Text Embeddings V2. This PR makes it possible to choose that embedding data type as well as configure the dimensions. For good measure it also adds support for supplemental storage configuration.

I have created and successfully run a new acceptance test. I should note, however, that testing against OpenSearch Serverless Collections (what I was personally targeting) is difficult generally and a polished implementation for acceptance tests seems to have been deferred when this resource was created some months, resulting in a somewhat difficult and manually intensive situation for myself as there was no perfect/automated model to follow. You will note that the extant OSSC tests were set to "skip" and that is how I am committing my new one (though it was not skipped for my actual testing). To perform my testing I pointed at an appropriate extant/external OSSC and then had the KB created by the acceptance test runs point at that. The relevant parameters are XXX'd out in the acceptance test. I have successfully tested against both "BINARY" and "FLOAT32" index data types backed by real OSSC instances created out-of-band.

In real life fully automated environments I have used the aws_lambda_invocation resource to execute post-creation OSSC manipulations to get an index in place. Bedrock KB creation fails without this underlying index being in place because it blows up when doing validation. I think it would be reasonable to continue to defer this realm as tech debt but to circle back imminently to get these acceptance tests into a better place across the board (not just mine; and I am happy to help across the board). At the moment, however, I am in a rather urgent situation where the impetus to have done this is the need to convert vector DBs in an operational environment to the "BINARY" data type.

Closes #40576.

References

https://aws.amazon.com/blogs/machine-learning/build-cost-effective-rag-applications-with-binary-embeddings-in-amazon-titan-text-embeddings-v2-amazon-opensearch-serverless-and-amazon-bedrock-knowledge-bases/

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent/client/create_knowledge_base.html

Output from Acceptance Testing

Andrews-MBP:terraform-provider-aws awgibbs$ make testacc TESTS=TestAccBedrockAgent_serial/KnowledgeBase/fancyOpenSearch PKG=bedrockagent
make: Verifying source code with gofmt...
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go1.23.3 test ./internal/service/bedrockagent/... -v -count 1 -parallel 20 -run='TestAccBedrockAgent_serial/KnowledgeBase/fancyOpenSearch'  -timeout 360m
2024/12/29 21:47:37 Initializing Terraform AWS Provider...
=== RUN   TestAccBedrockAgent_serial
=== PAUSE TestAccBedrockAgent_serial
=== CONT  TestAccBedrockAgent_serial
=== RUN   TestAccBedrockAgent_serial/KnowledgeBase
=== RUN   TestAccBedrockAgent_serial/KnowledgeBase/fancyOpenSearch
--- PASS: TestAccBedrockAgent_serial (20.73s)
    --- PASS: TestAccBedrockAgent_serial/KnowledgeBase (20.73s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/fancyOpenSearch (20.73s)
PASS
ok  	github.com/hashicorp/terraform-provider-aws/internal/service/bedrockagent	34.217s

…odel_configuration and supplemental_data_storage_configuration

  * extend appropriate go schema
  * add necessary go structs
  * add acceptance tests
  * extend docs as necessary
  * fix broken dash characters found in docs
@awgibbs awgibbs requested a review from a team as a code owner December 31, 2024 18:45
Copy link

Community Note

Voting for Prioritization

  • Please vote on this pull request by adding a 👍 reaction to the original post to help the community and maintainers prioritize this pull request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

For Submitters

  • Review the contribution guide relating to the type of change you are making to ensure all of the necessary steps have been taken.
  • For new resources and data sources, use skaff to generate scaffolding with comments detailing common expectations.
  • Whether or not the branch has been rebased will not impact prioritization, but doing so is always a welcome surprise.

@github-actions github-actions bot added documentation Introduces or discusses updates to documentation. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. service/bedrockagent Issues and PRs that pertain to the bedrockagent service. needs-triage Waiting for first response or review from a maintainer. labels Dec 31, 2024
@awgibbs
Copy link
Contributor Author

awgibbs commented Dec 31, 2024

It's worth further noting that the build environment (make tools) was broken as of my pull of main on Christmas. I had to hand remediate that situation to be able to work at all. I documented my mitigation in the following post but I'm not sure how we navigate that separate issue.

#40485 (comment)

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome @awgibbs 👋

It looks like this is your first Pull Request submission to the Terraform AWS Provider! If you haven’t already done so please make sure you have checked out our CONTRIBUTOR guide and FAQ to make sure your contribution is adhering to best practice and has all the necessary elements in place for a successful approval.

Also take a look at our FAQ which details how we prioritize Pull Requests for inclusion.

Thanks again, and welcome to the community! 😃

@ewbankkit ewbankkit added enhancement Requests to existing resources that expand the functionality or scope. and removed needs-triage Waiting for first response or review from a maintainer. labels Jan 8, 2025
@tai-awgibbs
Copy link

@ewbankkit and other code owners -- Is there additional work that is wanted on this PR (to be done either by myself or others) or is it just waiting for maintainers to have cycles to transact on it? I understand that you all have a lot on your plates. I just don't want this to get slowed down for want of something I might have needed to do to make it pass the bar. I'm trying to support an urgent business need with this and others are anxious to be able to start to use it.

@tai-awgibbs
Copy link

@ewbankkit I realized after dropping my last note that there was actually one failed check, versus skipped ones, among the list. I just pushed a fresh commit that addresses the semgrep failure which I first reproduced locally and then accepted the automatic changes. I also executed a fully realistic run of the new fancyOpenSearch test which passed.

Andrews-MBP:terraform-provider-aws awgibbs$ semgrep --config .ci/.semgrep.yml --config .ci/.semgrep-constants.yml --config .ci/.semgrep-test-constants.yml --config .ci/semgrep/ internal/service/bedrockagent/knowledge_base.go
┌──── ○○○ ────┐
│ Semgrep CLI │
└─────────────┘
Scanning 1 file (only git-tracked) with 364 Code rules:
CODE RULES
Scanning 1 file with 312 go rules.
SUPPLY CHAIN RULES
No rules to run.
PROGRESS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
┌──────────────┐
│ Scan Summary │
└──────────────┘
Ran 312 rules on 1 file: 0 findings.

Andrews-MBP:terraform-provider-aws awgibbs$ make testacc TESTS=TestAccBedrockAgent_serial/KnowledgeBase/fancyOpenSearch PKG=bedrockagent
make: Verifying source code with gofmt...
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go1.23.3 test ./internal/service/bedrockagent/... -v -count 1 -parallel 20 -run='TestAccBedrockAgent_serial/KnowledgeBase/fancyOpenSearch' -timeout 360m
2025/01/12 12:28:13 Initializing Terraform AWS Provider...
=== RUN TestAccBedrockAgent_serial
=== PAUSE TestAccBedrockAgent_serial
=== CONT TestAccBedrockAgent_serial
=== RUN TestAccBedrockAgent_serial/KnowledgeBase
=== RUN TestAccBedrockAgent_serial/KnowledgeBase/fancyOpenSearch
--- PASS: TestAccBedrockAgent_serial (20.55s)
--- PASS: TestAccBedrockAgent_serial/KnowledgeBase (20.55s)
--- PASS: TestAccBedrockAgent_serial/KnowledgeBase/fancyOpenSearch (20.55s)
PASS
ok github.com/hashicorp/terraform-provider-aws/internal/service/bedrockagent 34.473s

@awgibbs
Copy link
Contributor Author

awgibbs commented Jan 13, 2025

@ewbankkit Looks like my last commit got the "checks" to a happy place. Is there anything else I can do to help ready this for merge and release?

@jar-b
Copy link
Member

jar-b commented Jan 15, 2025

@awgibbs - thanks for your effort on this!

Given the pre-existing acceptance tests requiring OpenSearch vector stores also do not automate the setup steps, I think we can proceed without requiring that here. That said, it would be great to have a formal writeup of the setup steps so that a maintainer can replicate it and run the full test suite once, at least for this initial implementation. If you can provide that (or link to AWS docs), we can embed the steps as comments above the test case(s) and embellish with any provider specific notes as necessary.

I'll also open a follow-up issue to investigate automation of the setup for both OpenSearch based acceptance tests so we don't lose track of that work stream.

@jar-b jar-b self-assigned this Jan 15, 2025
@github-actions github-actions bot added the prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. label Jan 15, 2025
@awgibbs
Copy link
Contributor Author

awgibbs commented Jan 15, 2025

@awgibbs - thanks for your effort on this!

Given the pre-existing acceptance tests requiring OpenSearch vector stores also do not automate the setup steps, I think we can proceed without requiring that here. That said, it would be great to have a formal writeup of the setup steps so that a maintainer can replicate it and run the full test suite once, at least for this initial implementation. If you can provide that (or link to AWS docs), we can embed the steps as comments above the test case(s) and embellish with any provider specific notes as necessary.

I'll also open a follow-up issue to investigate automation of the setup for both OpenSearch based acceptance tests so we don't lose track of that work stream.

@awgibbs - thanks for your effort on this!

Given the pre-existing acceptance tests requiring OpenSearch vector stores also do not automate the setup steps, I think we can proceed without requiring that here. That said, it would be great to have a formal writeup of the setup steps so that a maintainer can replicate it and run the full test suite once, at least for this initial implementation. If you can provide that (or link to AWS docs), we can embed the steps as comments above the test case(s) and embellish with any provider specific notes as necessary.

I'll also open a follow-up issue to investigate automation of the setup for both OpenSearch based acceptance tests so we don't lose track of that work stream.

Hey @jar-b -- Stoked to finish the swing on this!

I think this is probably the best AWS documentation for our present purposes ==> https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-create.html

When I did my own testing I just slapped an additional Bedrock Knowledgebase (using my rev'd provider) on an existing OpenSearch Serverless Collection that had been created with Terraform. I imagine a more generalized manual approach here would be to follow the docs ^^ to create a BKB in the console, which when using the "Quick create a new vector store" option will automate creating the underlying OSSC, and then create an identical-ish BKB using my rev'd Terraform provider as in my "fancy OpenSearch" test. It's important to match up the vector type and dimensions across BKB and OSSC or the BKB creation step will throw an error. Maybe the easiest thing to do beyond that is to ensure that the "fancy OpenSearch" test plugs in whatever values come out of the "Quick create a new vector store" console workflow by default to minimize any last second massaging of the test suite (maybe just plugging in the service role would be the only thing required if we did that?).

To the extent it's helpful I'm happy to keep pulling on the oars myself to get this over the finish line. Maybe I should inline some docs in internal/service/bedrockagent/knowledge_base_test.go by adding a "Prerequisites" comments block ahead of testAccKnowledgeBaseConfig_fancyOpenSearch that captures the foregoing?

Or I can just be close air support on whatever final touches you want to make yourself.

Let me know how to help you help me. Thanks! :-)

@tai-awgibbs
Copy link

@jar-b I think the commit I just pushed, cb94002, does what I suggested in my previous comment and hopefully checks your boxes for testing documentation/ease improvements. Please let me know if there is anything else I can do.

@jar-b
Copy link
Member

jar-b commented Jan 16, 2025

Thanks for the writeup @awgibbs, much appreciated! I've tried the manual setup a few times via the console and continually hit a failure during KB creation.

image

The OpenSearch collection does get created and the appropriate indexes appear to be present. The error message doesn't provide any context, and inspecting the network calls has yet to reveal any additional info. Rather than continue wasting time here, I think I'm going to translate the setup steps into Terraform to hopefully gain more insight into the failure mode via API error messages. This should also lay some groundwork for eventually automating this.

The downside is this is going to delay getting a review on this a bit. I'll keep things updated here as progress is made.

Edit: And to be clear - the previous message was in no way intended to indicate frustration with the setup writeup you provided, which was great. All frustration is reserved for the UX of the AWS console as it continually tries to do too much magic.

@tai-awgibbs
Copy link

@jar-b Obviously I'm a bit too information poor here to be able to understand why you encountered this error. Based on priors I wonder if either your IAM identity lacks the permissions and/or there is a Service Control Policy on the account where you tried to do it (which would give you the same problems with TF versus console/CLI). But I think the Bedrock error you're encountering might not matter because the only thing we cared about was the creation of the underlying OSSC (and, arguably, the service role to be applied to the BKB that we subsequently create in the test). If those things created then you should be able to just plug in their identifiers (the role ARN and the OSSC ARN) into the test code per the instructions and you're off to the races. The creation of the BKB in the console is "interesting" but maybe not a blocker.

@jar-b
Copy link
Member

jar-b commented Jan 16, 2025

Good point. I tried again in an alternate region and watched the network calls a bit more closely. It seems the first step is creation of the OSSC and polling for the presence of the vector index, which eventually completes successfully. It's unclear what is failing next (network calls just kind of stop), but the service linked role is not created. Perhaps it's a permissions barrier there, but I can create roles otherwise so not quite sure why that'd be the case.

I'll see if I can manually replicate the required permissions on the service linked role and proceed from there with the existing OSSC.

@awgibbs
Copy link
Contributor Author

awgibbs commented Jan 16, 2025

@jar-b For reference, when the console workflow created a role for me, it had a trust policy that looked like (sensitive data stubbed out)...

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AmazonBedrockKnowledgeBaseTrustPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "bedrock.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "ACCOUNT_ID"
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:bedrock:REGION:ACCOUNT_ID:knowledge-base/*"
                }
            }
        }
    ]

... and then beyond my adding S3FullAccess as I mentioned it had three other policies that looked like...

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BedrockInvokeModelStatement",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel"
            ],
            "Resource": [
                "arn:aws:bedrock:REGION::foundation-model/amazon.titan-embed-text-v2:0"
            ]
        }
    ]
}

... and ...

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "OpenSearchServerlessAPIAccessAllStatement",
            "Effect": "Allow",
            "Action": [
                "aoss:APIAccessAll"
            ],
            "Resource": [
                "arn:aws:aoss:REGION:ACCOUNT_ID:collection/COLLECTION_ID"
            ]
        }
    ]
}

... and...

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "S3ListBucketStatement",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::SDS_BUCKET_NAME"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:ResourceAccount": [
                        "ACCOUNT_ID"
                    ]
                }
            }
        },
        {
            "Sid": "S3GetObjectStatement",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::SDS_BUCKET_NAME/*"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:ResourceAccount": [
                        "ACCOUNT_ID"
                    ]
                }
            }
        }
    ]
}

@jar-b
Copy link
Member

jar-b commented Jan 17, 2025

Thanks - I got most of the way there via the Amazon Bedrock docs on creating a service role, though there were some notable S3 permission omissions (which I suspect is why you also had to attach the S3FullAccess managed policy). At this point I'm thinking to refactor the acceptance test to include all of the setup except the OSSC, which will instead be fetched via a data source.

data "aws_opensearchserverless_collection" "test" {
  name = "bedrock-knowledge-base-sh5068"
}

This will allow us to gate the test behind a single environment variable (something like TF_ACC_BEDROCK_OSS_COLLECTION_NAME) to which the user supplies the name of the OSS collection where the vector data store has been pre-configured. If the environment variable is unset, the test is skipped. This leaves only the creation of the OSS collection and its associated vector index as remaining steps to automate this in the future.

jar-b added 12 commits January 27, 2025 16:02
For compatibility with protocol version 5, still used by pre-V1 versions of Terraform core.

```console
% make testacc PKG=bedrockagent TESTS=TestAccBedrockAgent_serial/KnowledgeBase/basic
make: Verifying source code with gofmt...
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go1.23.3 test ./internal/service/bedrockagent/... -v -count 1 -parallel 20 -run='TestAccBedrockAgent_serial/KnowledgeBase/basic'  -timeout 360m -vet=off
2025/01/27 16:18:37 Initializing Terraform AWS Provider...
=== RUN   TestAccBedrockAgent_serial
=== PAUSE TestAccBedrockAgent_serial
=== CONT  TestAccBedrockAgent_serial
=== RUN   TestAccBedrockAgent_serial/KnowledgeBase
=== RUN   TestAccBedrockAgent_serial/KnowledgeBase/basicRDS
=== RUN   TestAccBedrockAgent_serial/KnowledgeBase/basicOpenSearch
    knowledge_base_test.go:244: Bedrock Agent Knowledge Base requires external configuration of a vector index
--- PASS: TestAccBedrockAgent_serial (1368.24s)
    --- PASS: TestAccBedrockAgent_serial/KnowledgeBase (1368.24s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/basicRDS (1368.24s)
        --- SKIP: TestAccBedrockAgent_serial/KnowledgeBase/basicOpenSearch (0.00s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/bedrockagent       1374.635s
```

```console
% TF_AWS_BEDROCK_OSS_COLLECTION_NAME=jb-test make testacc PKG=bedrockagent TESTS=TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage
make: Verifying source code with gofmt...
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go1.23.3 test ./internal/service/bedrockagent/... -v -count 1 -parallel 20 -run='TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage'  -timeout 360m -vet=off
2025/01/27 16:40:10 Initializing Terraform AWS Provider...

--- PASS: TestAccBedrockAgent_serial (39.78s)
    --- PASS: TestAccBedrockAgent_serial/KnowledgeBase (39.78s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage (39.78s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/bedrockagent       46.240s
```
```console
% TF_AWS_BEDROCK_OSS_COLLECTION_NAME=jb-test make testacc PKG=bedrockagent TESTS=TestAccBedrockAgent_serial/KnowledgeBase/OpenSearch

--- PASS: TestAccBedrockAgent_serial (145.36s)
    --- PASS: TestAccBedrockAgent_serial/KnowledgeBase (145.36s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchBasic (47.22s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchUpdate (47.86s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage (50.29s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/bedrockagent       151.909s
```
```console
% TF_AWS_BEDROCK_OSS_COLLECTION_NAME=jb-test make testacc PKG=bedrockagent TESTS=TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage
make: Verifying source code with gofmt...
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go1.23.5 test ./internal/service/bedrockagent/... -v -count 1 -parallel 20 -run='TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage'  -timeout 360m -vet=off
2025/01/28 15:22:01 Initializing Terraform AWS Provider...
=== RUN   TestAccBedrockAgent_serial
=== PAUSE TestAccBedrockAgent_serial
=== CONT  TestAccBedrockAgent_serial
=== RUN   TestAccBedrockAgent_serial/KnowledgeBase
=== RUN   TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage
--- PASS: TestAccBedrockAgent_serial (61.28s)
    --- PASS: TestAccBedrockAgent_serial/KnowledgeBase (61.28s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage (61.28s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/bedrockagent       67.732s
```
…kip logic

Moving the check of the expected environment variable to a single function and sharing across all OpenSearch based tests.

```console
% TF_AWS_BEDROCK_OSS_COLLECTION_NAME=jb-test make testacc PKG=bedrockagent TESTS=TestAccBedrockAgent_serial/KnowledgeBase/OpenSearch
make: Verifying source code with gofmt...
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go1.23.5 test ./internal/service/bedrockagent/... -v -count 1 -parallel 20 -run='TestAccBedrockAgent_serial/KnowledgeBase/OpenSearch'  -timeout 360m -vet=off
2025/01/28 15:54:08 Initializing Terraform AWS Provider...

--- PASS: TestAccBedrockAgent_serial (186.70s)
    --- PASS: TestAccBedrockAgent_serial/KnowledgeBase (186.70s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage (51.22s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchBasic (56.01s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchUpdate (79.46s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/bedrockagent       193.189s
```
Copy link
Member

@jar-b jar-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

Using an existing OSS collection, jb-test, with a pre-configured vector index:

% TF_AWS_BEDROCK_OSS_COLLECTION_NAME=jb-test make testacc PKG=bedrockagent TESTS=TestAccBedrockAgent_serial/KnowledgeBase/
make: Verifying source code with gofmt...
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go1.23.5 test ./internal/service/bedrockagent/... -v -count 1 -parallel 20 -run='TestAccBedrockAgent_serial/KnowledgeBase/'  -timeout 360m -vet=off
2025/01/28 16:28:57 Initializing Terraform AWS Provider...

--- PASS: TestAccBedrockAgent_serial (4565.33s)
    --- PASS: TestAccBedrockAgent_serial/KnowledgeBase (4565.33s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchUpdate (80.08s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage (49.88s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/basic (1460.37s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/disappears (1431.29s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/tags (1498.04s)
        --- PASS: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchBasic (45.66s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/bedrockagent       4571.834s

And verifying OpenSearch dependent tests are skipped under normal execution:

% make testacc PKG=bedrockagent TESTS=TestAccBedrockAgent_serial/KnowledgeBase/OpenSearch
make: Verifying source code with gofmt...
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go1.23.5 test ./internal/service/bedrockagent/... -v -count 1 -parallel 20 -run='TestAccBedrockAgent_serial/KnowledgeBase/OpenSearch'  -timeout 360m -vet=off
2025/01/28 19:46:16 Initializing Terraform AWS Provider...
=== RUN   TestAccBedrockAgent_serial
=== PAUSE TestAccBedrockAgent_serial
=== CONT  TestAccBedrockAgent_serial
=== RUN   TestAccBedrockAgent_serial/KnowledgeBase
=== RUN   TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchBasic
    knowledge_base_test.go:245: This test requires external configuration of an OpenSearch collection vector index. Set the TF_AWS_BEDROCK_OSS_COLLECTION_NAME environment variable to the OpenSearch collection name where the vector index is configured.
=== RUN   TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchUpdate
    knowledge_base_test.go:289: This test requires external configuration of an OpenSearch collection vector index. Set the TF_AWS_BEDROCK_OSS_COLLECTION_NAME environment variable to the OpenSearch collection name where the vector index is configured.
=== RUN   TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage
    knowledge_base_test.go:359: This test requires external configuration of an OpenSearch collection vector index. Set the TF_AWS_BEDROCK_OSS_COLLECTION_NAME environment variable to the OpenSearch collection name where the vector index is configured.
--- PASS: TestAccBedrockAgent_serial (0.00s)
    --- PASS: TestAccBedrockAgent_serial/KnowledgeBase (0.00s)
        --- SKIP: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchBasic (0.00s)
        --- SKIP: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchUpdate (0.00s)
        --- SKIP: TestAccBedrockAgent_serial/KnowledgeBase/OpenSearchSupplementalDataStorage (0.00s)
PASS
ok      github.com/hashicorp/terraform-provider-aws/internal/service/bedrockagent       6.520s

@jar-b
Copy link
Member

jar-b commented Jan 29, 2025

Thanks for your contribution, @awgibbs! 👍

@awgibbs
Copy link
Contributor Author

awgibbs commented Jan 29, 2025

Thanks for your contribution, @awgibbs! 👍

Yeehaw! You made my week and it's only Tuesday. :-)

@FireballDWF
Copy link

I've got example terraform for creating the OSS collection and index from scratch I'd be willing to provide if that would be helpful to automate the testing, let me know best way to provide to you.

@ewbankkit
Copy link
Contributor

@FireballDWF Yes, please share.

Copy link
Contributor

@ewbankkit ewbankkit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀.

@FireballDWF
Copy link

@FireballDWF Yes, please share.


resource "aws_opensearchserverless_security_policy" "encryption" {
  name        = var.name
  type        = "encryption"
  description = "encryption with AWSOwnedKey"
  policy = jsonencode({
    "Rules" = [
      {
        "Resource" = [
          "collection/${var.name}"
        ],
        "ResourceType" = "collection"
      }
    ],
    "AWSOwnedKey" = true
  })
}

resource "aws_opensearchserverless_security_policy" "network" {
  name        = var.name
  type        = "network"
  description = "Public access"
  policy = jsonencode([
    {
      Description = "Public access to collection and Dashboards endpoint for example collection",
      Rules = [
        {
          ResourceType = "collection",
          Resource = [
            "collection/${var.name}"
          ]
        },
        {
          ResourceType = "dashboard"
          Resource = [
            "collection/${var.name}"
          ]
        }
      ],
      AllowFromPublic = true # TODO: change to the "VPC access" example at https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/opensearchserverless_security_policy
    }
  ])
}
resource "aws_opensearchserverless_collection" "rag" {
  name             = var.name
  type             = "VECTORSEARCH"
  description      = "rag"
  standby_replicas = "DISABLED" # Change to ENABLED for production usage

  depends_on = [aws_opensearchserverless_security_policy.encryption]
  # tags # if not using default tags
}

resource "aws_opensearchserverless_access_policy" "rag" {
  name        = var.name
  type        = "data"
  description = "read and write permissions"

  # https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html#serverless-data-access-syntax
  # [BUG]Provider produced inconsistent final plan - https://github.com/opensearch-project/terraform-provider-opensearch/issues/183
  policy = jsonencode([
    {
      Description = var.name
      Rules = [
        {
          ResourceType = "index",
          Resource = [
            "index/${var.name}/*"
          ],
          Permission = [
            "aoss:CreateIndex",
            "aoss:DeleteIndex",
            "aoss:DescribeIndex",
            "aoss:ReadDocument",
            "aoss:UpdateIndex",
            "aoss:WriteDocument"
          ]
        },
        {
          ResourceType = "collection",
          Resource = [
            "collection/${var.name}"
          ],
          Permission = [
            "aoss:CreateCollectionItems",
            "aoss:DescribeCollectionItems",
            "aoss:UpdateCollectionItems"
          ]
        }
      ],
      Principal = [
        var.assume_role,
        #data.aws_caller_identity.current.arn,
        "arn:${data.aws_partition.this.partition}:iam::${data.aws_caller_identity.current.account_id}:role/Admin", # TODO: parameterize this
        aws_iam_role.bedrock_kb.arn
      ]
    }
  ])
}

# will sometimes fail on create with 503 - Service Unavailable due to eventual consistency of creation of aws_iam_role_policy.bedrock_kb_oss, just apply again.  Or add a sleep delay
# submitted https://github.com/opensearch-project/terraform-provider-opensearch/issues/199 to address the problem
# https://registry.terraform.io/providers/opensearch-project/opensearch/latest/docs/resources/index
resource "opensearch_index" "rag" {
  name                           = var.name #if add -suffix then need to update data access policy
  number_of_shards               = "2"
  number_of_replicas             = "0"
  index_knn                      = true
  index_knn_algo_param_ef_search = 512

  # TODO: make dimension size configurable
  mappings = <<EOF
{
  "properties": {
    "bedrock-knowledge-base-default-vector": {
          "type": "knn_vector",
          "dimension": 1024,
          "method": {
            "name": "hnsw",
            "engine": "faiss",
            "parameters": {
              "m": 16,
              "ef_construction": 512
            },
            "space_type": "l2"
          }
        },
        "AMAZON_BEDROCK_METADATA": {
          "type": "text",
          "index": false
        },
        "AMAZON_BEDROCK_TEXT_CHUNK": {
          "type": "text",
          "index": true
        }
  }
}
EOF
  depends_on = [
    aws_opensearchserverless_access_policy.rag,
    aws_opensearchserverless_security_policy.network,
    aws_opensearchserverless_collection.rag,
    aws_iam_role_policy.bedrock_kb_oss # required for index creation to be ALLOWed
  ]

  # mappings will get updated during create of aws_bedrockagent_data_source.kb during aws_bedrockagent_knowledge_base.kb, thus later applies would replace/delete those values, thus need to ignore changes.
  # Thus If mappings above need to be changed, destroy and recreate the resource to result in changes getting deployed
  lifecycle {
    ignore_changes = [
      mappings
    ]
  }
  force_destroy = true
}

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    awscc = {
      source  = "hashicorp/awscc"
      version = ">= 0.25.0"
    }
    opensearch = {
      source  = "opensearch-project/opensearch"
      version = ">= 2.3.0" # 2.2.1 fails to auth with "Error:EOF" suspect https://github.com/opensearch-project/terraform-provider-opensearch/issues/179
    }
    null = {
      source  = "hashicorp/null"
      version = ">= 3.2.1"
    }
  }
  required_version = ">= 1.5"
}

# Configure the OpenSearch provider https://registry.terraform.io/providers/opensearch-project/opensearch/latest/docs#schema
provider "opensearch" {
  url                   = aws_opensearchserverless_collection.rag.collection_endpoint
  healthcheck           = false # TODO: set back to true once working with ci/cd
  aws_region            = data.aws_region.this.name
  sign_aws_requests     = true
  aws_signature_service = "aoss"
  aws_assume_role_arn   = var.assume_role # reminder cross account access not supported per bullet #4 at https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html#serverless-limitations
}

@jar-b
Copy link
Member

jar-b commented Jan 29, 2025

TIL about the opensearch provider. Thanks @FireballDWF! 👍

@jar-b jar-b merged commit 2f477c9 into hashicorp:main Jan 29, 2025
47 checks passed
@github-actions github-actions bot added this to the v5.85.0 milestone Jan 29, 2025
@github-actions github-actions bot removed the prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. label Feb 3, 2025
Copy link

github-actions bot commented Feb 3, 2025

This functionality has been released in v5.85.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Introduces or discusses updates to documentation. enhancement Requests to existing resources that expand the functionality or scope. service/bedrockagent Issues and PRs that pertain to the bedrockagent service. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement]: Add support for supplementalDataStorageConfiguration block in aws_bedrockagent_knowledge_base
5 participants