Skip to content

Commit

Permalink
deploy: add option to create buckets with test data and clarify docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mmalenic committed Sep 6, 2024
1 parent d2b018f commit 59a4aa3
Show file tree
Hide file tree
Showing 8 changed files with 266 additions and 133 deletions.
25 changes: 14 additions & 11 deletions deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,24 +13,27 @@ The CDK code in this directory constructs a CDK app from [`HtsgetLambdaStack`][h
[`bin/settings.ts`][htsget-settings]:

#### HtsgetSettings

These are general settings for the CDK deployment.

| Name | Description | Type |
|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------|
| <span id="config">`config`</span> | The location of the htsget-rs server config. This must be specified. This config file configures the htsget-rs server. See [htsget-config] for a list of available server configuration options. | `string` |
| <span id="domain">`domain`</span> | The domain name for the Route53 Hosted Zone that the htsget-rs server will be under. This must be specified. A hosted zone with this name will either be looked up or created depending on the value of [`lookupHostedZone?`](#lookupHostedZone). | `string` |
| <span id="authorizer">`authorizer`</span> | Deployment options related to the authorizer. Note that this option allows specifying an AWS [JWT authorizer][jwt-authorizer]. The JWT authorizer automatically verifies tokens issued by a Cognito user pool. | [`HtsgetJwtAuthSettings`](#htsgetjwtauthsettings) |
| <span id="subDomain">`subDomain?`</span> | The domain name prefix to use for the htsget-rs server. Together with the [`domain`](#domain), this specifies url that the htsget-rs server will be reachable under. Defaults to `"htsget"`. | `string` |
| <span id="s3BucketResources">`s3BucketResources?`</span> | The resources that are affected by the bucket policy with actions: `["s3:List*", "s3:Get*"]`. If this is not specified, it defaults to `["arn:aws:s3:::*"]`. This affects which buckets are allowed to be accessed with the policy. | `string[]` |
| <span id="lookupHostedZone">`lookupHostedZone?`</span> | Whether to lookup the hosted zone with the domain name. Defaults to `true`. If `true`, attempts to lookup an existing hosted zone using the domain name. Set this to `false` if you want to create a new hosted zone with the domain name. | `boolean` |
| Name | Description | Type |
| ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------- |
| <span id="config">`config`</span> | The location of the htsget-rs server config. This must be specified. This config file configures the htsget-rs server. See [htsget-config] for a list of available server configuration options. | `string` |
| <span id="domain">`domain`</span> | The domain name for the Route53 Hosted Zone that the htsget-rs server will be under. This must be specified. A hosted zone with this name will either be looked up or created depending on the value of [`lookupHostedZone?`](#lookupHostedZone). | `string` |
| <span id="authorizer">`authorizer`</span> | Deployment options related to the authorizer. Note that this option allows specifying an AWS [JWT authorizer][jwt-authorizer]. The JWT authorizer automatically verifies tokens issued by a Cognito user pool. | [`HtsgetJwtAuthSettings`](#htsgetjwtauthsettings) |
| <span id="subDomain">`subDomain?`</span> | The domain name prefix to use for the htsget-rs server. Together with the [`domain`](#domain), this specifies url that the htsget-rs server will be reachable under. Defaults to `"htsget"`. | `string` |
| <span id="s3BucketResources">`s3BucketResources`</span> | The buckets to serve data from. If this is not specified, this defaults to `[]`. This affects which buckets are allowed to be accessed by the policy actions which are `["s3:List*", "s3:Get*"]`. Note that this option alone does not create buckets, it only gives permission to access them, see the `createS3Buckets` option. This option must be specified to allow `htsget-rs` to access data in the buckets. | `string[]` |
| <span id="lookupHostedZone">`lookupHostedZone?`</span> | Whether to lookup the hosted zone with the domain name. Defaults to `true`. If `true`, attempts to lookup an existing hosted zone using the domain name. Set this to `false` if you want to create a new hosted zone with the domain name. | `boolean` |
| <span id="lookupHostedZone">`createS3Buckets?`</span> | A list of buckets to create. Defaults to no buckets. Buckets are created with [`RemovalPolicy.RETAIN`](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.RemovalPolicy.html). This also copies the example data under the `data` directory to those buckets. | `string[]` |

#### HtsgetJwtAuthSettings

These settings are used to determine if the htsget API gateway endpoint is configured to have a JWT authorizer or not.

| Name | Description | Type |
|---------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| ------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- |
| <span id="public">`public`</span> | Whether this deployment is public. If this is `true` then no authorizer is present on the API gateway and the options below have no effect. | `boolean` |
| <span id="jwtAudience">`jwtAudience?`</span> | A list of the intended recipients of the JWT. A valid JWT must provide an aud that matches at least one entry in this list. | `string[]` |
| <span id="jwtAudience">`jwtAudience?`</span> | A list of the intended recipients of the JWT. A valid JWT must provide an aud that matches at least one entry in this list. | `string[]` |
| <span id="cogUserPoolId?">`cogUserPoolId?`</span> | The cognito user pool id for the authorizer. If this is not set, then a new user pool is created. No user pool is created if [`public`](#public) is true. | `string` |

The [`HtsgetSettings`](#htsgetsettings) are passed into [`HtsgetLambdaStack`][htsget-lambda-stack] in order to change the deployment config. An example of a public instance deployment
Expand All @@ -49,7 +52,7 @@ After installing the basic dependencies, complete the following steps:

1. Login to AWS and define `CDK_DEFAULT_*` env variables (if not defined already). You must be authenticated with your AWS cloud to run this step.
2. Install [cargo-lambda], as it is used to compile artifacts that are uploaded to aws lambda.
3. Define which configuration to use for htsget-rs as stated in the configuration section.
3. Define which configuration to use for htsget-rs as stated in the configuration section.

Below is a summary of commands to run in this directory:

Expand Down
1 change: 1 addition & 0 deletions deploy/bin/settings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ export const SETTINGS: HtsgetSettings = {
"arn:aws:s3:::org.umccr.demo.htsget-rs-data/*",
],
lookupHostedZone: true,
createS3Buckets: [],
jwtAuthorizer: {
// Set this to true if you want a public instance.
public: false,
Expand Down
5 changes: 5 additions & 0 deletions deploy/config/dev_umccr.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ contact_url = "https://umccr.org/"
documentation_url = "https://github.com/umccr/htsget-rs"
environment = "dev"

[[resolvers]]
regex = '^(org.umccr.dev.htsget-rs-test-data)/(?P<key>.*)$'
substitution_string = '$key'
storage = 'S3'

[[resolvers]]
regex = '^(umccr-10c-data-dev)/(?P<key>.*)$'
substitution_string = '$key'
Expand Down
3 changes: 2 additions & 1 deletion deploy/examples/local_storage/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ curl http://127.0.0.1:8080/reads/data/bam/htsnexus_test_NA12878
```

Which outputs:

```sh
{
"htsget": {
Expand All @@ -41,4 +42,4 @@ default settings, and `curl http://127.0.0.1:8080/reads/data/<id>`, noting the e

[local]: ../../../htsget-config/README.md#resolvers
[compose]: compose.yml
[data]: ../../../data
[data]: ../../../data
18 changes: 9 additions & 9 deletions deploy/examples/minio/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,16 @@
[MinIO][minio] can be used with htsget-rs by configuring the [storage type][storage] as `S3` and setting the `endpoint` to the MinIO server.
There are a few specific configuration options that need to be considered to use MinIO with htsget-rs, and those include:

* The standard [AWS environment variables][env-variables] for connecting to AWS services must be set, and configured to match those
used by MinIO.
* This means that htsget-rs expects an `AWS_DEFAULT_REGION` to be set, which must match the region used by MinIO (by default us-east-1).
* It also means that the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` must be set to match the credentials used by MinIO.
* If using virtual-hosted style [addressing][virtual-addressing] instead of path style [addressing][path-addressing], `MINIO_DOMAIN` must be
set on the MinIO server and DNS resolution must allow accessing the MinIO server using `bucket.<MINIO_DOMAIN>`.
* Path style addressing can be used instead by setting `path_style = true` under the htsget-rs resolvers storage type.
- The standard [AWS environment variables][env-variables] for connecting to AWS services must be set, and configured to match those
used by MinIO.
_ This means that htsget-rs expects an `AWS_DEFAULT_REGION` to be set, which must match the region used by MinIO (by default us-east-1).
_ It also means that the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` must be set to match the credentials used by MinIO.
- If using virtual-hosted style [addressing][virtual-addressing] instead of path style [addressing][path-addressing], `MINIO_DOMAIN` must be
set on the MinIO server and DNS resolution must allow accessing the MinIO server using `bucket.<MINIO_DOMAIN>`. \* Path style addressing can be used instead by setting `path_style = true` under the htsget-rs resolvers storage type.

The caveats around the addressing style occur because there are two different addressing styles for S3 buckets, path style, e.g.
`http://minio:9000/bucket`, and virtual-hosted style, e.g. `http://bucket.minio:9000`. AWS has declared path style addressing
as [deprecated][path-style-deprecated], so this example sets up virtual-hosted style addressing as the default.
as [deprecated][path-style-deprecated], so this example sets up virtual-hosted style addressing as the default.

## Deployment using Docker

Expand All @@ -36,6 +35,7 @@ curl http://127.0.0.1:8080/reads/bam/htsnexus_test_NA12878
```

Outputs:

```sh
{
"htsget": {
Expand Down Expand Up @@ -68,4 +68,4 @@ docker exec -it minio curl -H "Range: bytes=0-2596770" "http://data.minio:9000/b
[virtual-addressing]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#virtual-hosted-style-access
[path-addressing]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#path-style-access
[compose]: compose.yml
[data]: ../../../data
[data]: ../../../data
47 changes: 40 additions & 7 deletions deploy/lib/htsget-lambda-stack.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@ import { STACK_NAME } from "../bin/htsget-lambda";
import * as TOML from "@iarna/toml";
import { readFileSync } from "fs";

import { Duration, Stack, StackProps, Tags } from "aws-cdk-lib";
import { Duration, RemovalPolicy, Stack, StackProps, Tags } from "aws-cdk-lib";
import { Construct } from "constructs";

import { UserPool } from "aws-cdk-lib/aws-cognito";
import {
ManagedPolicy,
PolicyStatement,
Role,
ServicePrincipal,
PolicyStatement,
ManagedPolicy,
} from "aws-cdk-lib/aws-iam";
import { Architecture } from "aws-cdk-lib/aws-lambda";
import {
Expand All @@ -29,6 +29,12 @@ import {
HttpMethod,
} from "aws-cdk-lib/aws-apigatewayv2";
import { HttpJwtAuthorizer } from "aws-cdk-lib/aws-apigatewayv2-authorizers";
import {
BlockPublicAccess,
Bucket,
BucketEncryption,
} from "aws-cdk-lib/aws-s3";
import { BucketDeployment, Source } from "aws-cdk-lib/aws-s3-deployment";

/**
* Settings related to the htsget lambda stack.
Expand All @@ -50,10 +56,12 @@ export type HtsgetSettings = {
subDomain?: string;

/**
* Policies to add to the bucket. If this is not specified, this defaults to `["arn:aws:s3:::*"]`.
* This affects which buckets are allowed to be accessed by the policy actions which are `["s3:List*", "s3:Get*"]`.
* The buckets to serve data from. If this is not specified, this defaults to `[]`. This affects which buckets are
* allowed to be accessed by the policy actions which are `["s3:List*", "s3:Get*"]`. Note that this option alone
* does not create buckets, it only gives permission to access them, see the `createS3Buckets` option.
* This option must be specified to allow `htsget-rs` to access data in the buckets.
*/
s3BucketResources?: string[];
s3BucketResources: string[];

/**
* Whether this deployment is gated behind a JWT authorizer, or if its public.
Expand All @@ -66,6 +74,13 @@ export type HtsgetSettings = {
* domain name.
*/
lookupHostedZone?: boolean;

/**
* A list of buckets to create. Defaults to no buckets. Buckets are created with
* [`RemovalPolicy.RETAIN`](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.RemovalPolicy.html).
* This also copies the example data under the `data` directory to those buckets.
*/
createS3Buckets?: string[];
};

/**
Expand Down Expand Up @@ -151,9 +166,27 @@ export class HtsgetLambdaStack extends Stack {

const s3BucketPolicy = new PolicyStatement({
actions: ["s3:List*", "s3:Get*"],
resources: settings.s3BucketResources ?? ["arn:aws:s3:::*"],
resources: settings.s3BucketResources ?? [],
});

if (settings.createS3Buckets) {
for (const name of settings.createS3Buckets ?? []) {
const bucket = new Bucket(this, "Bucket", {
blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
encryption: BucketEncryption.S3_MANAGED,
enforceSSL: true,
removalPolicy: RemovalPolicy.RETAIN,
bucketName: name,
});

const dataDir = path.join(__dirname, "..", "..", "data");
new BucketDeployment(this, "DeployFiles", {
sources: [Source.asset(dataDir)],
destinationBucket: bucket,
});
}
}

lambdaRole.addManagedPolicy(
ManagedPolicy.fromAwsManagedPolicyName(
"service-role/AWSLambdaBasicExecutionRole",
Expand Down
Loading

0 comments on commit 59a4aa3

Please sign in to comment.