Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV: update docs #79

Merged
merged 2 commits into from
Jun 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 5 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
## Contents

<!-- TOC start (generated with https://github.com/derlin/bitdowntoc) -->

- [Why serverless?](#why-serverless)
- [Introduction to sBeacon](#introduction-to-sbeacon)
- [Installation](#installation)
Expand All @@ -20,12 +21,10 @@
* [Data ingestion API](#data-ingestion-api)
* [Query API](#query-api)
- [Securing the API](#securing-the-api)
* [Enabling API security](#enabling-api-security)
* [API access commands](#api-access-commands)
* [How API security works](#how-api-security-works)
- [Troubleshooting](#troubleshooting)
* [Illegal instruction (core dumped)](#illegal-instruction-core-dumped)
* [Provider produced inconsistent final plan](#provider-produced-inconsistent-final-plan)

<!-- TOC end -->

## Why serverless?
Expand Down Expand Up @@ -220,23 +219,11 @@ Please make a copy of `backend.tf.template` with suited parameters and rename as

### Example data

Please find the data in [./examples/test-data/](./examples/test-data/) and use the [./examples/test-data/GUIDE.md](./examples/test-data/GUIDE.md) to try the provided test data.
Please find the data in [./examples/test-data/](./examples/test-data/) and use the [./docs/USAGE-GUIDE.md](./docs/USAGE-GUIDE.md) to try the provided test data.

### Data ingestion API

Use the following schemas for data submission

* Submit dataset - please follow the JSON schema at [./shared_resources/schemas/submitDataset-schema-new.json](./shared_resources/schemas/submitDataset-schema-new.json)
* Update dataset - please follow the JSON schema at [./shared_resources/schemas/submitDataset-schema-update.json](./shared_resources/schemas/submitDataset-schema-update.json)

Use the following schemas for submitted entities

* Dataset - [./shared_resources/schemas/dataset-schema.json](./shared_resources/schemas/dataset-schema.json)
* Cohort - [./shared_resources/schemas/cohort-schema.json](./shared_resources/schemas/cohort-schema.json)
* Individual - [./shared_resources/schemas/individual-schema.json](./shared_resources/schemas/individual-schema.json)
* Biosample - [./shared_resources/schemas/biosample-schema.json](./shared_resources/schemas/biosample-schema.json)
* Run - [./shared_resources/schemas/run-schema.json](./shared_resources/schemas/run-schema.json)
* Analysis - [./shared_resources/schemas/analysis-schema.json](./shared_resources/schemas/analysis-schema.json)
Please refer to the documentation outlined at [./docs/INGESTION-GUIDE.md](./docs/INGESTION-GUIDE.md).

### Query API

Expand All @@ -247,57 +234,7 @@ Querying is available as per API defined by BeaconV2 [https://beacon-project.io/

## Securing the API

We have provided the essential architectural templates to enable the token based authentication of the API access. If you are using the module configuration of sBeacon, modify the `main.tf` as follows. Alternatively, you can edit these variable in the `variables.tf` file.

### Enabling API security
```bash
# main.tf
module "serverless-beacon" {
# add the following as desired
beacon-enable-auth = true
beacon-guest-username = "guest@gmail.com"
beacon-guest-password = "XXXXX"
beacon-admin-username = "admin@gmail.com"
beacon-admin-password = "XXXXX"
}
```

In order to retrieve the commands needed to get access token, add an `output.tf` file in the module configuration.
```bash
# variables.tf
output "cognito_client_id" {
value = module.serverless-beacon.cognito_client_id
description = "Cognito client Id for user registration and login."
}

output "admin_login_command" {
value = module.serverless-beacon.admin_login_command
description = "AWS cli command to get admin login token"
}

output "guest_login_command" {
value = module.serverless-beacon.guest_login_command
description = "AWS cli command to get guest login token"
}
```
A examples are available at [./examples/minimum/](./examples/minimum/) and [./examples/full](./examples/full).

### API access commands

Upon successful `terraform apply` you'll be prompted with an output similar to below.
```bash
api_url = "https://XXXXX.execute-api.us-east-1.amazonaws.com/"
cognito_client_id = "XXXXX"
admin_login_command = "aws cognito-idp admin-initiate-auth --user-pool-id us-east-1_A89RD07je --region us-east-1 --client-id 100n0tno0e0sql96mcgciaa8to --auth-flow ADMIN_USER_PASSWORD_AUTH --auth-parameters USERNAME=admin@gmail.com,PASSWORD=XXXXX --output json --query AuthenticationResult.IdToken"
guest_login_command = "aws cognito-idp admin-initiate-auth --user-pool-id us-east-1_A89RD07je --region us-east-1 --client-id XXXXX --auth-flow ADMIN_USER_PASSWORD_AUTH --auth-parameters USERNAME=guest@gmail.com,PASSWORD=XXXXX --output json --query AuthenticationResult.IdToken"
```

Use either `admin_login_command` or `guest_login_command` to retrieve the **IdToken**. You can use this as the bearer token to access the API.

### How API security works

There are three groups of users `record-access-user-group`, `count-access-user-group` and `boolean-access-user-group`. Admin user belons to all three groups while guest has only **counts** and **boolean** access. Adding new users must be done using the Cognito User Pool as an administrator. Alternatively, infrastructure can be modified to support alternative authentication flows.

Please refer to the documentation outlined at [./docs/AUTH-GUIDE.md](./docs/AUTH-GUIDE.md).

## Troubleshooting

Expand Down
66 changes: 66 additions & 0 deletions docs/AUTH-GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Authentication and authorisation guide
## Contents

<!-- TOC start (generated with https://github.com/derlin/bitdowntoc) -->

- [Introduction ](#introduction)
- [Enabling API security](#enabling-api-security)
- [API access commands](#api-access-commands)
- [How API security works](#how-api-security-works)

<!-- TOC end -->

## Introduction
We have provided the essential architectural templates to enable the token based authentication of the API access. If you are using the module configuration of sBeacon, modify the `main.tf` as follows. Alternatively, you can edit these variable in the `variables.tf` file.

## Enabling API security
```bash
# main.tf
module "serverless-beacon" {
# add the following as desired
beacon-enable-auth = true
beacon-guest-username = "guest@gmail.com"
beacon-guest-password = "XXXXX"
beacon-admin-username = "admin@gmail.com"
beacon-admin-password = "XXXXX"
}
```
❗Important note❗

By default, passwords for the default users are set through the `main.tf` file. Please consider using a `terraform.tfvars` file in order to avoid accidentally commits of sensitive passwords to version control systems. Documentation is available at [https://developer.hashicorp.com/terraform/tutorials/configuration-language/variables](https://developer.hashicorp.com/terraform/tutorials/configuration-language/variables).

In order to retrieve the commands needed to get access token, add an `output.tf` file in the module configuration.
```bash
# variables.tf
output "cognito_client_id" {
value = module.serverless-beacon.cognito_client_id
description = "Cognito client Id for user registration and login."
}

output "admin_login_command" {
value = module.serverless-beacon.admin_login_command
description = "AWS cli command to get admin login token"
}

output "guest_login_command" {
value = module.serverless-beacon.guest_login_command
description = "AWS cli command to get guest login token"
}
```
A examples are available at [../examples/minimum/](../examples/minimum/) and [../examples/full](../examples/full).

## API access commands

Upon successful `terraform apply` you'll be prompted with an output similar to below.
```bash
api_url = "https://XXXXX.execute-api.us-east-1.amazonaws.com/"
cognito_client_id = "XXXXX"
admin_login_command = "aws cognito-idp admin-initiate-auth --user-pool-id us-east-1_A89RD07je --region us-east-1 --client-id 100n0tno0e0sql96mcgciaa8to --auth-flow ADMIN_USER_PASSWORD_AUTH --auth-parameters USERNAME=admin@gmail.com,PASSWORD=XXXXX --output json --query AuthenticationResult.IdToken"
guest_login_command = "aws cognito-idp admin-initiate-auth --user-pool-id us-east-1_A89RD07je --region us-east-1 --client-id XXXXX --auth-flow ADMIN_USER_PASSWORD_AUTH --auth-parameters USERNAME=guest@gmail.com,PASSWORD=XXXXX --output json --query AuthenticationResult.IdToken"
```

Use either `admin_login_command` or `guest_login_command` to retrieve the **IdToken**. You can use this as the bearer token to access the API.

## How API security works

There are three groups of users `record-access-user-group`, `count-access-user-group` and `boolean-access-user-group`. Admin user belons to all three groups while guest has only **counts** and **boolean** access. Adding new users must be done using the Cognito User Pool as an administrator. Alternatively, infrastructure can be modified to support alternative authentication flows.
34 changes: 34 additions & 0 deletions docs/INGESTION-GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## Contents

<!-- TOC start (generated with https://github.com/derlin/bitdowntoc) -->

- [Introduction](#introduction)
- [Data submission schemas](#data-submission-schemas)
- [Model schemas](#model-schemas)
- [Examples](#examples)

<!-- TOC end -->

## Introduction

sBeacon comply with the Beacon V2 schema in terms of the API interface for querying. Data submission to sBeacon follows a similar format with few added fields to build the entity relationships.

## Data submission schemas

* Submit dataset - please follow the JSON schema at [../shared_resources/schemas/submit-dataset-schema-new.json](../shared_resources/schemas/submit-dataset-schema-new.json)
* Update dataset - please follow the JSON schema at [../shared_resources/schemas/submit-dataset-schema-update.json](../shared_resources/schemas/submit-dataset-schema-update.json)

## Model schemas

* Dataset - [../shared_resources/schemas/dataset-schema.json](../shared_resources/schemas/dataset-schema.json)
* Cohort - [../shared_resources/schemas/cohort-schema.json](../shared_resources/schemas/cohort-schema.json)
* Individual - [../shared_resources/schemas/individual-schema.json](../shared_resources/schemas/individual-schema.json)
* Biosample - [../shared_resources/schemas/biosample-schema.json](../shared_resources/schemas/biosample-schema.json)
* Run - [../shared_resources/schemas/run-schema.json](../shared_resources/schemas/run-schema.json)
* Analysis - [../shared_resources/schemas/analysis-schema.json](../shared_resources/schemas/analysis-schema.json)

Schemas does not apply for genomic variations in the ingestion phase of sBeacon. sBeacon supports standard `vcf.gz` files and must be accompanied with their index `vcf.gz.tbi` or `vcf.gz.csi` files.

## Examples

Please refer to [USAGE-GUIDE.md](./USAGE-GUIDE.md) to find a complete example to get started.
44 changes: 34 additions & 10 deletions examples/test-data/GUIDE.md → docs/USAGE-GUIDE.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,31 @@
# Getting started with test data
## Contents
<!-- TOC start (generated with https://github.com/derlin/bitdowntoc) -->

- [Getting started with test data](#getting-started-with-test-data)
- [Data submission](#data-submission)
* [Option 1: Submission as request body](#option-1-submission-as-request-body)
* [Option 2: Submission as an S3 payload](#option-2-submission-as-an-s3-payload)
- [API usage](#api-usage)
* [POST requst to `/g_variants` with following payload](#post-requst-to-g_variants-with-following-payload)
* [POST request to `/g_variants/R1JDSDM4CTEJNTQ2ODAyCUcJQw==/individuals` with following payload](#post-request-to-g_variantsr1jdsdm4ctejntq2odaycucjqwindividuals-with-following-payload)
* [POST request to `/individuals` with following payload](#post-request-to-individuals-with-following-payload)

<!-- TOC end -->

## Getting started with test data

Please ensure you first upload the `chr1.vcf.gz` and `chr1.vcf.gz.tbi` files to an S3 bucket that is accessible from the sBeacon deployment account. Obtain the S3 URI for the `chr1.vcf.gz` from the uploaded desitation. Note that, both `vcf.gz` and `vcf.gz.tbi` files must have the same prefix in S3 for this to work. Please note that, all the buckets you create in AWS are in the same region as the deployment.

Now edit the `submission.json` using the S3 URI of the `vcf.gz` file.
Now edit the [`submission.json`](../examples/test-data/submission.json) using the S3 URI of the `vcf.gz` file.

```json
. . .
{
// other fields
"vcfLocations": [
"s3://<bucket>/<prefix>/chr1.vcf.gz"
]
. . .
],
// other fields
}
```

## Data submission
Expand All @@ -22,7 +38,7 @@ You can simply copy the edited JSON content in to the API gateway `/submit_datas

### Option 2: Submission as an S3 payload

Alternatively, you can upload edited `submission.json` file to an S3 location accessible from deployment. Then you can use the file's S3 URI as follows in the API Gateway or in your REST client.
Alternatively, you can upload edited [`submission.json`](../examples/test-data/submission.json) file to an S3 location accessible from deployment. Then you can use the file's S3 URI as follows in the API Gateway or in your REST client.

```json
{
Expand All @@ -32,10 +48,14 @@ Alternatively, you can upload edited `submission.json` file to an S3 location ac

Option 2 is recommended for larger submissions with thousands of metadata entries.

## API testing
## API usage

### POST requst to `/g_variants` with following payload

**Query**

Schema for the `query.requestParameters` attribute is available at [../shared_resources/schemas/g-variants-request-parameters.json](../shared_resources/schemas/g-variants-request-parameters.json).

```json
{
"meta": {
Expand All @@ -61,7 +81,9 @@ Option 2 is recommended for larger submissions with thousands of metadata entrie
}
```

Result
**Result**

Result follows the `genomic variations` models which has the schema presented at [../shared_resources/schemas/genomic-variation-schema.json](../shared_resources/schemas/genomic-variation-schema.json).

```json
{
Expand Down Expand Up @@ -167,6 +189,7 @@ Result

### POST request to `/g_variants/R1JDSDM4CTEJNTQ2ODAyCUcJQw==/individuals` with following payload

**Query**
```json
{
"meta": {
Expand All @@ -182,7 +205,7 @@ Result
}
```

Result
**Result**

```json
{
Expand Down Expand Up @@ -270,6 +293,7 @@ Result

### POST request to `/individuals` with following payload

**Query**
```json
{
"query": {
Expand All @@ -286,7 +310,7 @@ Result
}
```

Result
**Result**

```json
{
Expand Down
2 changes: 1 addition & 1 deletion lambda/submitDataset/route_create_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ def summarise_dataset(dataset):

def validate_request(parameters):
# load validator
new_schema = "./schemas/submitDataset-schema-new.json"
new_schema = "./schemas/submit-dataset-schema-new.json"
schema_dir = os.path.dirname(os.path.abspath(new_schema))
new_schema = json.load(open(new_schema))
resolveNew = RefResolver(base_uri="file://" + schema_dir + "/", referrer=new_schema)
Expand Down
2 changes: 1 addition & 1 deletion lambda/submitDataset/route_update_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ def summarise_dataset(dataset):

def validate_request(parameters):
# load validator
update_schema = "./schemas/submitDataset-schema-update.json"
update_schema = "./schemas/submit-dataset-schema-update.json"
schema_dir = os.path.dirname(os.path.abspath(update_schema))
update_schema = json.load(open(update_schema))
resolver = RefResolver(base_uri="file://" + schema_dir + "/", referrer=update_schema)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -214,10 +214,12 @@ def parse_request(event) -> Tuple[RequestParams, str]:
try:
request_params = RequestParams(**body_dict).from_request(params)
except ValidationError as e:
errors = defaultdict(list)
errors = defaultdict(set)

for e in e.errors():
errors[e["msg"]].append(".".join(e["loc"]))
return request_params, dict(errors), 400
errors[e["msg"]].add(".".join([str(l) for l in e["loc"]]))

return request_params, dict({k: list(v) for k, v in errors.items()}), 400

if BEACON_ENABLE_AUTH:
# either use belongs to a group or they are unauthorized
Expand Down
Loading